This is a short blog post about a pattern that we’ve observed more frequently among some of the large enterprises: the use of AWS S3 as both an observability lake and a data bus. AWS S3’s simple API, ubiquitous language support, unmatched reliability and durability, retention options, and numerous pricing plans have made it the de facto standard for storing massive amounts of data.
The cloud native revolution brought by Kubernetes has transformed the way we build and deliver software, but the world of big data has for too long been left on the side of this transformation. Thanks to many contributions from the open source community, Apache Spark integration on top of Kubernetes is now officially generally available with the recent releases this year.
Like many cool tools out there, this project started from a request made by a customer of ours. Having recently migrated to our service, this customer had ~30TB of historical logging data. This is a considerable amount of operational data to leave behind when moving from one SaaS platform to another. Unfortunately, most observability solutions are built around the working assumption that data flows are future-facing.
The recent surge in internet usage and its corresponding increase in data has triggered a new awareness among business owners. One of the main questions from marketers is how their businesses can benefit from Big Data. As a business owner, you should know that Big Data is indeed one of the most incredible things to happen to the marketing industry. If used right, it can enhance your company’s ability to serve its consumers and increase revenue.
Wouldn’t it be nice to be able to perfectly predict the future? We are a long way from being able to do that, but that is basically the goal of anybody working in the data science field — take a bunch of historical data and then try to make future predictions based on that data.