Operations | Monitoring | ITSM | DevOps | Cloud

Honeycomb

Bring Test Engineering into your DevOps practice

What do a test engineer and a DevOps or SRE team member have in common? The reality is that different teams need to proactively understand what is happening in production at critical milestones along the software engineering delivery cycle. In the words of Abby Bangser, senior test engineer at Moo, “Testing has so much in common with Ops and SRE teams. We need to ask interesting questions of production. We need no more debates whether a bug gets fixed.

Using Honeycomb to remember to delete a feature flag

Feature flags are great and serve us in so many ways. However, we do not love long-lived feature flags. They lead to more complicated code, and when we inevitably default them to be true for all our users, they lead to unused sections of code. In other words, tech debt. How do we stay on top of this? Find out how Honeycomb’s trigger alerts proactively tell you to go ahead and clean up that feature flag tech debt!

Getting At The Good Stuff: How To Sample Traces in Honeycomb

(This is the first post by our new head of Customer Success, Irving.) Sampling is a must for applications at scale; it’s a technique for reducing the burden on your infrastructure and telemetry systems by only keeping data on a statistical sample of requests rather than 100% of requests. Large systems may produce large volumes of similar requests which can be de-duplicated.

Instrumenting Lambda with Traces: A Complete Example in Python

We’re big fans of AWS Lambda at Honeycomb. As you may have read, we recently made some major improvements to our storage engine by leveraging Lambda to process more data in less time. Making a change to a complex system like our storage engine is daunting, but can be made less so with good instrumentation and tracing. For this project, that meant getting instrumentation out of Lambda and into Honeycomb.

Honeycomb SLO Now Generally Available: Success, Defined.

Honeycomb now offers SLOs, aka Service Level Objectives. This is the second in a set of of essays on creating SLOs from first principles. Previously, in this series, we created a derived column to show how a back-end service was doing. That column categorized every incoming event as passing, failing, or irrelevant. We then counted up the column over time to see how many events passed and failed. But we had a problem: we were doing far too much math ourselves.

From "Secondary Storage" To Just "Storage": A Tale of Lambdas, LZ4, and Garbage Collection

When we introduced Secondary Storage two years ago, it was a deliberate compromise between economy and performance. Compared to Honeycomb’s primary NVMe storage attached to dedicated servers, secondary storage let customers keep more data for less money. They could query over longer time ranges, but with a substantial performance penalty; queries which used secondary storage took many times longer to run than those which didn’t.