When you send telemetry into Honeycomb, our infrastructure needs to buffer your data before processing it in our “retriever” columnar storage database. For the entirety of Honeycomb’s existence, we have used Apache Kafka to perform this buffering function in our observability pipeline.
In a perfect world, small pieces of software scale up and down with a high-performance message broker like a heart, pumping data in a controlled, efficient manner. However, at some point, in every single system, an application starts pumping out large data requests, in a sporadic, uncontrollable manner. Because here, we are talking about the real world, right? In this article, we will address the problem of adapting a legacy system to play with a stream-based platform.