A Million Dollar Knob: S3 Object Lifecycle Optimization
At Sumo Logic, we manage petabytes of unstructured log data as part of our core log search and analytics offering. Multiple terabytes of data are indexed every day and stored persistently in AWS S3. When a query is executed against this data via UI, API, scheduled search or pre-installed apps, the indexed files are retrieved from S3 and cached in a custom read-through cache for these AWS S3 objects. For the most part, the caching scheme for S3 objects works reasonably well.