The latest News and Information on Observabilty for complex systems and related technologies.
Oh man, did I get ahead of myself in my last post! I started chatting tools, and I realize now that I really should have been talking more about why I’m using Sensu and Gremlin. But it didn’t occur to me until last year at Monitorama. John Allspaw gave the keynote talk (Taking Human Performance Seriously). While you can watch the talk here, I’ll highlight a few points...
New eGuide takes a closer look at Prometheus, ELK and Jaeger:
Feature flags are great and serve us in so many ways. However, we do not love long-lived feature flags. They lead to more complicated code, and when we inevitably default them to be true for all our users, they lead to unused sections of code. In other words, tech debt. How do we stay on top of this? Find out how Honeycomb’s trigger alerts proactively tell you to go ahead and clean up that feature flag tech debt!
(This is the first post by our new head of Customer Success, Irving.) Sampling is a must for applications at scale; it’s a technique for reducing the burden on your infrastructure and telemetry systems by only keeping data on a statistical sample of requests rather than 100% of requests. Large systems may produce large volumes of similar requests which can be de-duplicated.
2020 is here and it looks like it’ll be a truly exciting and impactful year for the DevOps community. As you know, the landscape is changing rapidly, and as a result, new technologies and methodologies are emerging to solve challenges you’re experiencing on the job. Observability is one such concept–and achieving it is a huge challenge for software engineers across the globe.
We’re big fans of AWS Lambda at Honeycomb. As you may have read, we recently made some major improvements to our storage engine by leveraging Lambda to process more data in less time. Making a change to a complex system like our storage engine is daunting, but can be made less so with good instrumentation and tracing. For this project, that meant getting instrumentation out of Lambda and into Honeycomb.
Honeycomb now offers SLOs, aka Service Level Objectives. This is the second in a set of of essays on creating SLOs from first principles. Previously, in this series, we created a derived column to show how a back-end service was doing. That column categorized every incoming event as passing, failing, or irrelevant. We then counted up the column over time to see how many events passed and failed. But we had a problem: we were doing far too much math ourselves.
Once again we are at the end of another year, facing into the endless potential of the next one, and thinking back on the fun and hard work behind us. Join me now in a review of some of our achievements and amusements, features and funtimes, shout-outs and selfies–and we’ll be back to do it all again in another 12 months.