Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

What's New in Tempo 3.0

Tempo 3.0 introduces a major architectural shift that decouples the read and write paths, with Kafka handling durability on the write side and a new live store serving recent traces on the read side. Blocks are now written at a replication factor of one instead of three, significantly reducing storage overhead. This release also brings TraceQL metrics to general availability, adds comparison operators for filtering metric results at query time, and introduces a new Tempo CLI redact command for removing sensitive trace data on demand without waiting for retention to expire.

Inside the Grafana AI Team Weekly: AI Observability for the OTel demo and LLMSpec (May 12, 2026)

This is an excerpt from a real AI team weekly meeting where we talk about the stuff we build and occasionally also demo them! In this one, Principal Software Engineer Sven Großmann demos how he integrated AI Observability into the OTel demo, complete with the guards feature he introduced last week, and Principal Software Engineer Yas Ekinci gives a rare glimpse of LLMSpec, the internal counterpart of the o11ybench benchmark that we use to evaluate Assistant.

Shifting Streams and AI Surges: What Our Data Reveals About the OTT Landscape

OTT data from early 2026 shows streaming hierarchies holding steady while AI platforms reshuffled rapidly. Claude has substantially increased traffic since January, overtaking Gemini, and is on pace to challenge ChatGPT by fall. Doug Madory digs into the data in this new analysis.

Tempo 3.0 release: a new architecture for scale and lower TCO, TraceQL metrics GA, and more

Tempo started with a simple goal: make distributed tracing easier to run at scale. As tracing adoption has grown, however, so have the challenges, including higher data volumes, more complex architectures, and increasing demand for real-time insights directly from traces. Over the last year, we’ve been evolving Tempo’s architecture to meet that moment. And today, we’re sharing the results of those efforts with the release of Tempo 3.0.

A deep dive into AWS data perimeter misconfigurations

In AWS environments, a data perimeter is a set of preventative controls that help ensure that your trusted cloud identities (principals or AWS services acting on your behalf) are accessing trusted resources from authorized networks. You can apply these controls at various levels of your infrastructure, such as per resource or across all resources in your AWS account.

How we cut Spark compute costs by 44% with agentic AI and Datadog Jobs Monitoring

Spark jobs only get more expensive and harder to debug as they scale. It’s a problem we’ve run into ourselves. Our Referential Data Platform team builds and maintains the knowledge graph that maps relationships between customers’ observability entities. ServiceQueryEdge is at the center of that graph, mapping service entities to their associated metric and log queries.

Migrate to Azure Managed Redis with Datadog and Eden

Azure Managed Redis is a Microsoft first-party, fully managed in-memory data store, replacing Azure Cache for Redis tiers. It includes Redis Enterprise features such as RediSearch for vector search and full-text search, in addition to RedisJSON, RedisTimeSeries, and Active Geo-Replication. As Azure Cache for Redis reaches end of life, more teams are planning migrations to Azure Managed Redis in search of better performance, lower cost, and modern capabilities for AI and real-time workloads.

What a Forrester TEI study on Edwin AI actually tells IT leaders-and how to use it

This blog helps IT leaders use the Forrester Consulting TEI study as a practical framework for evaluating Edwin AI in their own environments. A Total Economic Impact study is useful for one, critical reason: it takes a broad technology claim and turns it into a financial and operational framework. That matters in AI for IT operations because the market is crowded with claims. Every platform says it reduces noise. Every platform says it improves efficiency. Every platform says it helps teams move faster.

15 DevOps Metrics Every Engineering Team Should Track in 2026

Software moves from code to production more quickly today, but it is still difficult to tell whether delivery is actually improving or just becoming more active. Most teams rely on dashboards filled with metrics like deployments, uptime, failures, and tickets. The numbers are available, but the meaning behind them is often unclear. DevOps metrics become useful only when grouped into clear categories: DORA metrics cover only delivery speed and stability, which is just part of the picture.