Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Grafana 12.4 TL;DR - The Final 12.x Release

As the final minor release in the Grafana 12 series, 12.4 builds on our shift toward scalable, as-code workflows and a dramatically improved user experience. From bi-directional Git workflows to smarter dashboard layouts and stronger governance controls, this release is all about helping teams move faster with less friction.

Best Website Monitoring Tools for Compliance and Security in 2026

Compliance audits used to be annual fire drills. Teams would scramble for weeks gathering screenshots, pulling logs, and hoping nothing slipped through the cracks. That approach no longer works when regulations like GDPR and HIPAA require continuous documentation and real-time evidence of security controls. Website monitoring tools designed for compliance have evolved to address this reality, automating evidence collection and flagging issues before auditors ever arrive.

Claude Code + OpenTelemetry: Per-Session Cost and Token Tracking

I was looking at our Claude Code spend in the Anthropic console the other day. Aggregate cost, aggregate tokens — no breakdown by developer, no breakdown by session. I knew my Hackathon team had been using it heavily on building out new features for the OpenTelemetry Distro Builder. But heavily how? I had no idea. Turns out Claude Code has been emitting OpenTelemetry signals the whole time. Per-session cost, token counts, every tool call it makes on your codebase.

Digital Employee Experience Is Now Core to IT - Recognized by Analysts, Reinforced by Customers

Over the past few years, Digital Employee Experience (DEX) has moved from emerging concept to essential capability for modern IT organizations. The conversation has changed. IT is no longer measured only by system uptime or ticket resolution. Today, success is defined by how technology actually performs for employees — and how consistently organizations can deliver productive, friction-free digital work.

AI performance reviews for your app with the Flare CLI

The Flare CLI connects to your Flare performance monitoring data and uses AI to turn it into actionable insights, right from your terminal. In this video, you'll see how a single command pulls your real performance data from Flare, then generates a full review: identifying slow endpoints, spotting error trends, and suggesting concrete fixes. Links.

Fixing a production error with the Flare CLI and AI, from discovery to deploy

Using the Flare CLI and its agent skill to find, fix, and resolve a production error without leaving the terminal. The AI agent looks up the latest error on freek.dev via the Flare CLI, analyzes the stack trace against the local source code, generates a fix, deploys it using bash mode, and marks the error as resolved in Flare. Learn more.

Observability Self-Hosted 2026.1 - Server Configuration Comparisons

In this video, SolarWinds Evangelist Chrystal Taylor introduces server configuration comparisons, a new feature in Observability Self-Hosted 2026.1 and Server Configuration Monitor 2026.1. The key highlight is the ability to compare server configurations side by side, enabling users to identify differences in configuration files between nodes or against a defined ideal state. This new functionality aims to help users monitor configuration drift.

Incident Report: Exercises, Cleanups, and Evacuations

Every year, Honeycomb runs disaster recovery scenarios in multiple environments, including in production. Although each of our instances runs in a single region, on at least three Availability Zones (AZs), we have multiple plans for partial regional failures, and particularly, zonal failures. One of these tests was run on December 5th, and after its successful completion came its cleanup steps.

Alerting Is a Socio-Technical System

In the previous posts, we’ve looked at how alert noise emerges from design decisions, why notification lists fail to create accountability, and why alerts only work when they’re designed around a clear outcome. Taken together, these ideas point to a broader conclusion. That alerting is not just a technical system, it’s a socio-technical one. Alerting systems encode assumptions about how people behave, how responsibility is distributed, and how decisions are made under pressure.

Catch Every Moment in Kubernetes: Splunk's Observability Advantage

Discover why real-time, unsampled observability is critical for Kubernetes environments with Stephane Estevez from Splunk at KubeCon Europe 2026. Learn how Splunk’s unique approach helps you catch every important moment—even when containers vanish in milliseconds. Watch now for expert insights on cloud-native monitoring, observability, and Kubernetes best practices!