Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

Destroy on Friday: The Big Day A Chaos Engineering Experiment - Part 2

In my last blog post, I explained why we decided to destroy one third of our infrastructure in production just to see what would happen. This is part two, where I go over the big day. How did our chaos engineering experiment go? Find out below!

Streamlining Debugging with Lightrun Snapshots: A Superior Alternative to Trace Logging

According to a recent study, failing tests alone cost the enterprise software market an astonishing $61 billion annually. This figure mirrors the vast number of resources devoted to rectifying software failures, translating into about 620 million developer hours lost each year. On average, engineers spend 13 hours to resolve a single software failure, a statistic that paints a stark picture of the current state of debugging efficiency.
Featured Post

AI-enabled observability solutions are essential to manage application performance and security in on-premises environments

For all of the focus given to cloud-native technologies over recent years, it's sometimes easy to forget that a huge number of organizations continue to run their business critical and applications on-premises. And this will undoubtedly be the situation for some years to come within the public sector and in industries such as financial services and healthcare where organizations need to adhere to strict data privacy and security rules.

OpenTelemetry, AI, and the Future of Observability with Andreas Grabner

Shubham Srivastava from our team had the pleasure of meeting Andreas Grabner at KubeCon + CloudNativeCon Europe earlier this year. Andreas wears many hats in his daily work, primarily serving as a DevOps Activist at Dynatrace, where he has dedicated over 16 years to shape the Observability solutions we see today. He is also a Developer Advocate at Keptn – helping teams automate and orchestrate their deployments end-to-end and plays an active role as an Ambassador in the CNCF community.

What Makes for a 'Good' Pair Programming Session?

Software changes so rapidly that developing on the cutting edge of it cannot fall to a single person. When it comes to asynchronously disseminating information about projects, code comments, PR conversations, Slack, RFCs, and other investigatory documents do a wonderful job, but no amount of async communication replaces the magic of two brains bouncing ideas off of each other.

Unleashing the Power of Hybrid Cloud - Introducing Hybrid Observability in HPE GreenLake Flex Solutions

In today's fast-paced digital economy, businesses are constantly seeking innovative solutions to streamline their operations, enhance agility, and drive growth. As enterprise IT infrastructure environments get more distributed and complicated to meet evolving demands, the need for robust IT monitoring, management and automation becomes even more important.

Deploy on Friday? How About Destroy on Friday! A Chaos Engineering Experiment - Part 1

We recently took a daring step to test and improve the reliability of the Honeycomb service: we abruptly destroyed one third of the infrastructure in our production environment using AWS’s Fault Injection Service. You might be wondering why the heck we did something so drastic. In this post, we’ll go over why we did it and how we made sure that it wouldn’t impact our service.

Embark on the Observability Journey

With the advent of byte code instrumentation (BCI) in 2008, application performance management took a giant leap in what is known as "inside-out monitoring," that is, monitoring from inside the application. Before that, application monitoring was largely limited to tracking CPU, memory, disk, and process availability. BCI offered new opportunities in terms of how applications could be monitored and what could be monitored from an application performance perspective.

Observability as Code Explained: Benefits & How to Get Started

Traditional monitoring has become insufficient for managing complex systems. Modern infrastructures consist of numerous interconnected services, and simply monitoring individual metrics and logs fails to provide a comprehensive view. This is where observability becomes crucial.