Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Observabilty for complex systems and related technologies.

Ensure trust across the entire data life cycle with Datadog Data Observability

As data systems grow more complex and data becomes even more business-critical, teams struggle to detect and resolve issues that impact data quality, reliability, and, ultimately, trust. Engineers have to rely on manual checks and ad hoc SQL queries to catch data quality issues—often after teams relying on the data have noticed something has gone wrong.

Smarter Telemetry Pipelines: The Key to Cutting Datadog Costs and Observability Chaos

Log volume is exploding, costs are rising, and most teams are stuck duct-taping together short-term fixes. During our webinar, "Optimizing Log Management in Datadog: Cut Costs Without Losing Insights," we discuss how DevOps and engineering leaders are navigating the growing pains of observability, especially in environments where tools like Datadog are mission-critical but challenging to manage. Here’s a recap of the key takeaways.

It's The End Of Observability As We Know It (And I Feel Fine)

In a really broad sense, the history of observability tools over the past couple of decades have been about a pretty simple concept: how do we make terabytes of heterogeneous telemetry data comprehensible to human beings? New Relic did this for the Rails revolution, Datadog did it for the rise of AWS, and Honeycomb led the way for OpenTelemetry.

Lunar-level observability: How Firefly Aerospace used Grafana to monitor its historic moon landing

On March 2, 2025, Firefly Aerospace made history. The company — a space services firm that offers safe, reliable, and economical access to space — completed the first fully successful lunar landing by a commercial provider with its Blue Ghost Mission 1. But behind the headlines and highlight reels was a team of dedicated engineers, years of preparation, and a mission control center outfitted with Grafana dashboards.

The One Where We Show You Copilot Editor

Copilot Editor is like an AI-powered Rosetta Stone for telemetry. It helps Cribl users take raw, messy telemetry data and turn it into standardized, analytics-ready formats. The most important piece? It puts YOU in control. Our human-in-the-loop design means that users have full control over and visibility into what’s happening with their critical data, preventing AI-induced mistakes. Watch this fun demo with the AI product team to show Copilot Editor's true value to the average Cribl user!

Top Features of Splunk Observability Cloud for Engineers

In this video we’ll walk you through a demonstration of Splunk Observability Cloud’s key capabilities. You’ll see how you can monitor Kubernetes cluster health in Infrastructure Monitoring, and alert on your services’ health using AutoDetect Detectors and Alerts. We’ll then take a look at traces and metrics in APM, and use Related Content to find correlated log entries of error traces. Then we’ll use AlwaysOn Profiling to troubleshoot long duration traces for our service.

MCP = Observability + Code, a Real-life Example

Our bot is hitting an error. We can see it in the distributed trace. Here, see what happened when we noticed it: Austin fired up Claude Code (hooked up to Honeycomb with its MCP tool) and got it to find the error, fix it, deploy, and check that the fix worked. It got a little overconfident at first, but the ending is happy. IRL this took 22 minutes; the video speeds up the AI agent interactions and cuts out waiting. This video includes Austin Parker, Jessica Kerr, and Ken Rimple.

Beyond Shift Left: Engineering Leaders Increase Speed and Resilience With Observability

We recently had the privilege of hosting several industry experts and technology executives across platform strategy, SRE, and engineering enablement for breakfast at our Observability Day in London. We noted that they’re all facing the same fundamental tension: deliver faster, scale smarter, stay resilient, and somehow get ahead of what’s coming next. But how do you move fast without breaking things? And how do you prove the value of the things you don’t break?

Top 5 Observability Tools DevOps Teams Should Know

Observability and monitoring are the cornerstone of resilient, high-performing applications. Nearly every IT or software engineering leader we come into contact with emphasizes the importance of the ability to understand and diagnose what is going on with their applications at all times. Having clear and concise visibility into your applications is no longer optional.