Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Get organized, actionable insights from complex test environments with Datadog Test Suites

Modern teams often run hundreds of synthetic tests across multiple services, environments, and user journeys. While these tests provide deep visibility, managing them as a flat list can quickly become overwhelming, especially as organizations scale and teams specialize.

Top 11 Ruby APM Tools for 2025: A Performance-Driven Selection

Observability has become a core part of running Ruby applications at scale. Knowing how your app performs — from request latency to background job execution — helps catch slowdowns early and improve reliability. This blog walks through some of the most useful APM tools for Ruby in 2025. Each section highlights what the tool does well, where it fits best, and what kind of visibility it brings to your application's performance.

10 Proven APM Best Practices to Reducing Latency and Improving Response Time

Speed defines user loyalty. Recent market research indicates that organizations adopting advanced application performance monitoring (APM) tools are achieving measurable gains in user engagement, retention, and revenue. “ A 2025 performance study found that businesses tracking latency and response time proactively reduced customer churn by up to 30%. ” As applications expand across distributed architectures, microservices, and cloud environments, performance gaps become harder to diagnose.

How to Replace Synthetics with the httpcheck Receiver

A 200 OK doesn't always mean everything is okay. You've probably seen it: your health check endpoint returns success, but your users are staring at an error page. Maybe the database connection pool is exhausted, or a critical downstream service is timing out, but your API dutifully returns 200 because technically it responded. This is the reality of monitoring HTTP endpoints in production—status codes alone don't tell the whole story.

How to solve authentication failures when you have an Azure setup

It is not just your business. Enterprises worldwide face recurring technical issues related to authentication failures and access problems. These errors often pop up, especially in scenarios with service connection setups, pod/start failures, or integration issues. Most of the time, these errors indicated failed deployments, pods failing to pull images, or intermittent authentication/access errors.

Kubernetes monitoring & observability trends 2026 | Future of Kubernetes observability

Kubernetes continues to dominate as the container orchestration standard, but the way we monitor and observe clusters is rapidly evolving. As we head into 2026, Kubernetes monitoring is moving toward actionable insights, cost-aware observability, and security-first approaches. This blog dives deep into what engineers, architects, and platform teams should watch for in the year ahead — with real-world examples for context.

Unpacking the Elements of Site Uptime (by way of Jeopardy!)

Picture this: you’ve achieved your second lifelong dream of being a contestant on Jeopardy! Now it’s time for the fateful “final answer.” The good news? You’ve got a comfortable lead over your fellow contestants, and a correct response means eternal bragging rights. The bad news? Miss this one, and everyone — your family, coworkers, dentist, mechanic — will remind you of it forever. The lights dim. The audience holds its breath.

Declarative Configuration in OTel (Grafana OpenTelemetry Community Call #1)

We’re kicking off a brand-new Grafana OpenTelemetry Community Call! Join us as we dive into getting observability into your apps and infrastructure with Grafana, powered by OpenTelemetry. In this session, we’ll dive into Declarative Config — the new way to make OpenTelemetry onboarding simple and powerful. Instead of juggling environment variables or boilerplate in your startup code, declarative config gives you a clean, language-agnostic approach that works across SDKs and unlocks future possibilities like remote configuration. Join us with Marylia Gutierrez (OTel JavaScript approver & core contributor) to explore.