Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Distributed Tracing and related technologies.

AI Agents Observability with OpenTelemetry and the VictoriaMetrics Stack

Nowadays, AI agents are becoming more and more popular and often deployed as part of production systems. However, this rapid adoption brings unique observability challenges that require flexible solutions. On the one hand, AI agents are fundamentally just like any other software services that produce the same classic observability signals we’re familiar with: metrics, logs, and traces.

How Prometheus Exporters Work With OpenTelemetry

Running distributed systems means you need clear visibility into how your services behave. Prometheus has been the standard for metrics for a long time, and OpenTelemetry is now giving teams a more consistent way to collect telemetry across their stack. In many setups, you'll have both: existing Prometheus instrumentation that's already in place, and new components instrumented with OpenTelemetry.

How OpenTelemetry Is Redefining Application Performance Monitoring

The data is there, but it’s scattered across domains, formats, and vendors. Teams are often left piecing together an incomplete story of what went wrong, long after the damage has been done. Now, a new open standard is changing that. OpenTelemetry (OTel) is fast becoming the connective tissue of modern observability—an open-source framework designed to make telemetry data (metrics, logs, and traces) universally accessible.

OTel Updates: Declarative Config - A Steadier Way to Configure OpenTelemetry SDKs

Application configs change over time, often in small ways that are easy to miss. They may start simple — a few environment variables, one exporter, nothing unexpected. As your instrumentation grows, you add rules for filtering health check spans, adjust sampling based on attributes, or introduce environment-specific resource settings. Each change makes sense on its own. But months later, the picture can look different across dev, staging, and production.

4 Common OpenTelemetry Challenges and How Site24x7 Helps Overcome Them

OpenTelemetry (OTel) is transforming observability by standardizing and unifying how telemetry data such as metrics, logs, and traces are collected from distributed systems. However, while it unlocks new opportunities for monitoring and troubleshooting, adopting and operating OpenTelemetry comes with real-world challenges. Here’s what you need to know about these limitations, and how Site24x7 provides a holistic, simplified observability solution for your organization.

Sidecar or Agent for OpenTelemetry: How to Decide

Getting telemetry out of a distributed system isn’t the hard part. Getting it out cleanly, without noise, drop-offs, or odd performance side-effects — that’s where things get interesting. Before you worry about processors or storage costs, you need a clear plan for where the OTel Collector should run. Most teams narrow this down to two options: a sidecar that sits next to each service, or a node-level agent that handles data for everything running on the node. Both patterns are solid.

OTel Updates: Consistent Probability Sampling Fixes Fragmented Traces

You're sampling 1% of traces in production. A payment request fails at 3 AM. Logs show an error in order-service, but the full picture isn't there because different services made different sampling decisions. order-service kept the trace; payment-service didn't. So you end up checking logs and timestamps across a few services to piece things together. This happens because the usual probability sampling approach makes a separate choice at each service boundary.

OpenTelemetry Spans Explained: Deconstructing Distributed Tracing

In a microservices architecture, a single user request can pass through multiple services before completing. When performance drops or an error occurs, tracing that journey is the only way to locate the source. Distributed tracing provides that visibility. At its core are OpenTelemetry Spans — units of work that capture what each service does during a request.

How Atlassian built a smarter observability system with Grafana and OpenTelemetry

Discover how Atlassian built OpsDeck, an observability platform powered by Grafana, to automate incident detection, improve response time, and reduce troubleshooting from one hour to under a minute. Hear how the Observability Insights team scaled OpenTelemetry, broke silos, and built smarter workflows for both engineers and support.