Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Application Performance Monitoring and related technologies.

Ameet Talwalkar on Building the AI Research Lab

"We're doing cutting-edge AI, focused on real translational impact: getting our research over the wall and into production." Ameet Talwalkar, Datadog's Chief Scientist, shares what it took to build the AI Research Lab from the ground up — and what makes DAIR different from traditional research teams. At Datadog, research ships. Recent work from the lab includes Toto 2.0, open-weights time series forecasting models ranked on leading benchmarks, and ARFBench, a new benchmark for evaluating AI on real incident data.

Anthropic Monitoring & Observability with OpenTelemetry and SigNoz

Learn how to implement end-to-end monitoring and observability for Anthropic (Claude) API-based applications using OpenTelemetry and SigNoz. In this video, we walk through instrumenting your Anthropic API calls, collecting traces, metrics, and logs, and visualizing everything in SigNoz to gain real-time visibility into performance, failures, and bottlenecks. You'll see how to move from basic logging to production-grade observability, so you can debug faster, optimize latency, and confidently run Claude-powered AI systems at scale.

The Importance of Time Synchronization in Windows Authentication

Kerberos is a secure network authentication protocol that allows users and systems to prove their identity over a network without sending passwords in plain text. It is widely used in enterprise environments (for example, in Windows domains) to enable single sign-on (SSO). At its core, Kerberos uses a trusted authority called the Key Distribution Center (KDC) to issue encrypted “tickets” that verify identity.

Best APM for Small Development Teams in 2026

Last updated: May 2026 If your team is 2 to 20 developers and you do not have dedicated DevOps, SRE, or platform engineering, most APM tools were not built for you. They were built for the team that has you: a team with specialists who can tune dashboards, configure alerting pipelines, manage data retention policies, and explain the monitoring system to everyone else. You do not have that team. You have developers who also handle deploys, on-call, and debugging production issues between writing features.

What Is APM? A Guide to Application Performance Monitoring

A well-instrumented service tells your on-call engineer which deploy broke checkout, which span ate the latency budget, and which line to revert before the support queue fills up. Getting there depends on how cleanly your application performance monitoring layer turns telemetry into answers. The sections ahead walk through how APM works, the metrics and components worth tracking, the cloud-native challenges at scale, and how to evaluate APM tooling against your real workload.
Sponsored Post

The SDLC: phases, popular models, benefits & more

The Software Development Life Cycle (SDLC) describes the process we follow to deliver software to customers. It captures each step of creating software, from ideation to delivery and eventually to maintenance. In this post, we've broken down everything you need to understand the SDLC.

Best Elixir APM Tools in 2026: A Developer's Guide

Last updated: May 2026 Elixir applications have performance characteristics that are genuinely different from Ruby or Python. The BEAM virtual machine handles concurrency through lightweight processes, supervision trees restart failed processes automatically, and Phoenix channels can hold tens of thousands of persistent connections on a single node. These are strengths, but they also mean that the performance problems you encounter are different from what most APM tools were built to detect.

Observability and Security for the AI Era

Datadog has always been driven by a broader vision of helping teams understand and operate complex systems. In this session, you’ll hear from Michael Whetten, Product SVP, and Abrar Hussain, Senior Director, Product Management, as they share the latest updates across the Datadog product suite and discuss how that vision continues to shape the platform’s evolution and support the next generation of AI-driven applications.

How to Measure your Most Expensive Milliseconds

In the fast-paced world of mobile development, reliability rarely fails with a loud crash; instead, it degrades quietly through micro-regressions that erode user trust and engagement. While most companies track backend health and API latency, they often fly blind regarding the actual screen-level responsiveness that defines the true user experience. When Expedia Group underwent a major technical evolution, the team realized they lacked a consistent baseline to compare performance across platforms, leaving them unable to validate improvements before rollout.