Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Distributed Tracing and related technologies.

Sponsored Post

SAP Application Performance Monitoring (APM): Beyond Generic Metrics

Your enterprise APM tool shows SAP is using 90% CPU. The dashboard turns red. An alert fires. Now what? You open Dynatrace. You see the Java Virtual Machine metrics for your NetWeaver stack. You see HTTP response times for your Fiori apps. You see a spike in database calls. None of this tells you why VA01 takes 45 seconds to create a sales order. None of this tells you which custom ABAP report is consuming memory. None of this explains the short dump that crashed your pricing routine. This is the gap between generic APM and true SAP application performance monitoring. Your enterprise tools see the symptoms.

Claude Code + OpenTelemetry: Per-Session Cost and Token Tracking

I was looking at our Claude Code spend in the Anthropic console the other day. Aggregate cost, aggregate tokens — no breakdown by developer, no breakdown by session. I knew my Hackathon team had been using it heavily on building out new features for the OpenTelemetry Distro Builder. But heavily how? I had no idea. Turns out Claude Code has been emitting OpenTelemetry signals the whole time. Per-session cost, token counts, every tool call it makes on your codebase.

Release v2.9: OTEL Logs, Database Functions, SNMP Functions and more.

What’s New in Netdata v2.9 In this video, we walk through the biggest updates in Netdata v2.9, including: Top Tab Database Functions to analyze slow queries and performance bottlenecks without logging into your database SNMP Network Interfaces Function for real-time visibility into network interfaces Microsoft SQL Server Collector with richer MSSQL metrics OpenTelemetry Logs Ingestion to correlate logs and metrics in one place.

What is OpenTelemetry and Why Do Organizations Use it?

Mining for information about environments is like trying to find gold. Looking for gold can be sifting through silty waters or blasting through a mine. In some cases, the gold nuggets are so small as to be almost invisible, some things look like gold but aren’t, and others are larger nuggets where the miner strikes it rich. Trying to understand how a distributed system works means sifting through vast amounts of telemetry, looking for patterns.

Turn Raw Data into Reliability by Changing Performance Perspectives

In a global microservices architecture, technical performance initially presents as a chaotic stream of disconnected telemetry. For a Technical Program Manager (TPM), success depends on the ability to move past these disconnected individual data points to identify stable patterns. If they have services entering critical states, looking at individual logs or traces is inefficient. Protecting system reliability requires an engine that can automate pattern recognition at scale.

OpenTelemetry Production Monitoring: What Breaks, and How to Prevent It

OpenTelemetry almost always works beautifully in staging, demos, and videos. You enable auto-instrumentation, spans appear, metrics flow, the collector starts, and dashboards light up. Everything looks clean and predictable. However, production has a way of humbling even the most carefully prepared setups. When real traffic hits, and it always spikes sooner or later, you start seeing dropped spans.

OpenTelemetry support for .NET 10: A behind-the-scenes look

At Grafana Labs, we are fully committed to the open source OpenTelemetry project and are actively engaged with the OTel community. Many Grafanistas spend a large proportion of their time contributing directly to OpenTelemetry upstream projects, helping make observability more powerful, reliable, and accessible for everyone as part of our big tent philosophy.

Troubleshooting Microservices with OpenTelemetry Distributed Tracing

Distributed tracing doesn’t just show you what happened. It shows you why things broke. While logs tell you a service returned a 500 error and metrics show latency spiked, only traces reveal the full chain of causation: the upstream timeout that triggered a retry storm, the N+1 query pattern that saturated your connection pool, or the missing cache hit that turned a 50ms call into a 3-second database roundtrip.

The evolution of OpenTelemetry: A deep dive with co-founder Ted Young

Sometimes the biggest challenges in software aren’t about code — they’re about consensus. What do we call things? What do we standardize? And how do you evolve a system that thousands of companies depend on without breaking everything along the way?