Operations | Monitoring | ITSM | DevOps | Cloud

Your Root Cause Analysis is Flawed by Design

There’s a nagging feeling of déjà vu that haunts every network operations leader. You invest significant time and resources to resolve a major performance issue. Your best engineers isolate a culprit—a misbehaving load balancer, perhaps—and after a frantic effort, service is restored. You close the ticket, confident the problem is solved. Then, two weeks later, it’s back.

Whose Fault Is It When the Cloud Fails? Does It Matter?

On Monday, October 20th, a significant portion of the digital services we use every day became inaccessible. For hours, banking, communication, and entertainment applications were unavailable. The root cause was later identified as a major outage within Amazon Web Services (AWS), the infrastructure that powers a vast number of online services. The initial response for any business affected by such an event is a frantic effort to diagnose the problem. Is it our application? Is our network down?

Product Update - Turn Off Alerts, Use Microsoft Teams, and Custom Domains

Over the last few months IncidentHub has added several new features to make it easier to fine tune your alerts. IncidentHub now also integrates with Microsoft Teams and supports custom domains for your public status pages. Let's take a comprehensive look at what's new.

Jira Service Management (JSM) Review for Alerting (2025)

Atlassian is shutting down OpsGenie. New sales stopped on June 4, 2025, and the platform will be completely offline by April 5, 2027. As an OpsGenie user, you now face a critical decision: Migrate to Jira Service Management (JSM), Atlassian’s recommended path, or choose a different solution. And if you’re not sure JSM is the right fit for your team’s alerting needs, this review will help you decide. I signed up for JSM and put it through real-world testing.

Sliding Through Log-Time Space

This post kicks off a new series written by the Graylog Development Team. In these updates, we’ll highlight the features and fixes that make daily work in Graylog smoother. We want to show the work we care so much about and present the challenges we faced and overcame. Today, we’re starting with one of those minor but functional enhancements: Graylog time-range stepping.

Find and Fix Fastify Slowdowns with AppSignal for Node.js

In part one of this series, we set up basic performance monitoring for our Fastify application using AppSignal and explored key performance indicators. Now that we have our monitoring foundation in place, it's time to leverage these insights to actively improve application performance. You'll learn how to detect performance regressions, find optimization opportunities, and implement custom instrumentation with OpenTelemetry.

CEO Diaries: Not All AI Talent Is Alike

If Meta’s (now halted) nine-figure AI talent poaching scheme was any indication, the AI talent market is pretty frothy. The number of AI-related job postings has roughly tripled since 2019, and the average salary has more than doubled (Bain). The race is on for companies to find the fastest, most sustainable routes to AI-driven business value; all companies, but especially software companies, are hotly pursuing racers. But despite what Zuckerberg & Co.

Open Source Cloud Orchestration Tools Compared

Before 2011, cloud infrastructure was still new. AWS had launched EC2 and S3 in 2006. But to deploy applications, engineers had to manually spin up servers, configure storage, and set up networking — all by hand or with custom scripts. There were early configuration management tools, such as Chef and Puppet, but those didn’t offer full cloud orchestration. Then in 2011, AWS launched AWS CloudFormation as the first major orchestration tool.

Top 4 Inefficiencies For Dev Teams Resolving Issues

Every hour developers spend troubleshooting is an hour they’re not building features, innovating, or delivering value to customers. Yet in most organizations, issue management and debugging remains one of the biggest drains on productivity and release velocity. That frustration is exactly what led our founders, themselves developers, to create Lightrun.