Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Observabilty for complex systems and related technologies.

Monitoring vs. observability: The future of IT operations in 2026

For years, monitoring was the gold standard of infrastructure management. Dashboards. Thresholds. Alerts. If everything on the dashboard was green, you didn't need to worry. If something turned red, you responded. It was a model built on predictability, and for a long time, it worked. But modern infrastructure is no longer predictable.

The Second Edition of Observability Engineering Is Here

IT’S HERE it’s here it’s here it’s here!!!! The second edition of Observability Engineering is available for download, and since Honeycomb is the sponsor, you can now download it from our website (the dead tree version will take another month). This is a strange time to be writing a book.

Agent Timeline Is Now Generally Available

A few weeks ago I wrote about a customer’s refund request that stopped halfway through at 11:47 p.m. on a Tuesday night. That post walked through the 40 minutes it took to work out what happened when an agentic application had a problem: a tool retried against a rate-limited payments API, the error responses filled up the context window, and the agent gave up. The whole reason we built Agent Timeline was to turn that 40 minutes into five. To reduce MTTR. To solve the problem and get back to sleep.

Working as a remote engineer at Cribl | Building the AI Platform for Telemetry

Learn what it’s like to work as an engineer at Cribl, a remote-first company building the AI platform for IT and security data. In this recruiting video, Cribl’s engineering and support leaders share how fully distributed teams collaborate, solve hard data problems, and grow their careers while working from around the world. You’ll hear from managers and leaders in site reliability engineering, security incubation, and technical support about.

Observability for a Privacy-first AI Wearable | Grafana Everywhere

Trust is everything when AI gets personal. Golden Grot Award winner and NeoSapien co-founder and CEO Dhananjay Yadav shares how his team uses Grafana Assistant to ensure the privacy-first AI wearable delivers a seamless, reliable experience without compromising its mission. Because when AI moves closer to our everyday lives, teams need to know what’s happening — and users need to trust that it’s working as intended.

From event correlation to autonomous IT: Why observability isn't enough anymore

Most IT war rooms have plenty of data, but not enough time or clarity to find the real answer. Dashboards are crowded, alerts keep piling up, and the real issue gets lost in all the noise. Ever dealt with this situation? You’re not alone, and there’s a simpler way to deal with it. OpManager Nexus closes this gap by moving beyond visibility to help teams actually diagnose and fix problems faster.

Why AI observability is a critical ITOps priority

AI Observability is a Critical Priority for ITOps Teams See how LogicMonitor helps ITOps teams monitor AI workloads, reduce blind spots, and move toward Autonomous IT. Schedule a meeting AI has shifted from experimental pilots to everyday business operations. Customers are interacting with AI-powered applications. Engineering teams are building with LLMs, GPUs, APIs, and automation at a much faster pace. That adds to the visibility strain on already overburdened ITOps teams.

Datadog Data Observability: Be the first to know when data fails

Bad data doesn't announce itself. Datadog Data Observability gives you unified visibility across your entire data stack—from source systems and pipelines to dashboards and AI applications—so you catch silent failures before they cascade. Detect data quality and pipeline issues before stakeholders do, pinpoint root causes with end-to-end lineage, and reduce pipeline costs with job, cluster, and query recommendations.