Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Application Performance Monitoring and related technologies.

How to Measure your Most Expensive Milliseconds

In the fast-paced world of mobile development, reliability rarely fails with a loud crash; instead, it degrades quietly through micro-regressions that erode user trust and engagement. While most companies track backend health and API latency, they often fly blind regarding the actual screen-level responsiveness that defines the true user experience. When Expedia Group underwent a major technical evolution, the team realized they lacked a consistent baseline to compare performance across platforms, leaving them unable to validate improvements before rollout.

OpenTelemetry VM Setup Guide: SigNoz Collection Agents Explained

About This Video: If you're working with OpenTelemetry, managing collector configurations across environments like VMs can quickly become difficult. In this video, we focus on VM-based setups and walk through how to configure SigNoz Collection Agents step by step. We start with an introduction to VM collection agents, then move into a practical project walkthrough using the OpenTelemetry demo. From there, we explore the documentation, set up configurations, run the collector, and finally validate everything inside SigNoz.

What is Application Performance Monitoring (APM)?

A modern web application is not a single thing. A single user request may touch a web server, a database, a cache layer, and several third-party APIs before a response comes back. And as AI tools generate more and more application traffic (API calls, background jobs, automated workflows), the volume and unpredictability of that traffic is growing. When something goes wrong, it could be any of it. When something is slow, it could be all of it at once.
Sponsored Post

How to Set Up Raygun's Remote MCP Server in Cursor and Codex

After introducing Raygun's original MCP server and our new remote-first version, the most common question we hear is: "How do I actually set this up and start using it?" This guide covers exactly that, two short videos walking through setup and a real error being solved in both Cursor and Codex.

Practical AI-Enabled Observability for Agents and LLMs

You’re told to “go build agents” without clear guidance on what that actually means, how to do it well, or how to know if it is working. You are not a data scientist. You are a software engineer. In this talk, a Datadog AI product leader Shri Subramanian breaks down what changes when you move from building applications to building AI agents, and why familiar approaches like traditional testing and linear delivery fall short. We will explore how agent development shifts the focus from code alone to data, prompts, and evaluation, and why functional reliability matters just as much as operational reliability.

End to End Reliability for all your Workloads

Delivering great products to your customers requires a mix of evolution and consistency. To really land with users your product has to be ready to adapt and scale, prioritizing across a mix of customer and business needs. Join experts in reliability, systems engineering, and DevOps as they share real-world examples, true stories of pitfalls, and astounding impact from the experiments they have run. Learn how experienced practitioners handle failure, adapt to scale, and bridge gaps between teams to improve software performance and customer outcomes.

We Know Before it Breaks: Observability-Driven Development

When stakeholders push for faster growth (new markets, new features, newly modernized stack) your engineering model has to change too. At FitnessPassport, the shift from offshore waterfall delivery to an in-house team meant rebuilding not just services, but confidence: legacy systems with weak logging and little visibility made it hard to know whether changes were working and impossible to spot issues before users did. In this talk, Director of Engineering Rob Mitchell will share how FitnessPassport adopted Datadog and used structured logs, metrics, and traces to tighten feedback loops.

From Manual Requests to SelfServe: Building an AccessControlled App that Adapts Automatically

Platform teams often end up as the bottleneck for “small” operational asks: add a new button, wire up a workflow, expose one more cloud capability—each change requiring engineering time, reviews, and releases. In this technical deep dive, engineers from the Department of Government Services (Victoria) share the architecture and open source CDK library behind their “Infrastructure Control Panel”: a modular operational enablement app that lets non-technical users interact safely with cloud resources through strong access controls.