Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Application Performance Monitoring and related technologies.

Top DevOps Challenges in 2025 and How APM Solves Them

In 2025, DevOps continues to grow and change quickly, helping teams deliver software faster and more securely. But as systems become more complex with microservices, cloud platforms, and AI-driven tools, new challenges arise. Teams now need to balance speed with security, manage too many tools, control rising cloud costs, and still maintain high-quality software. This is where Application Performance Monitoring (APM) becomes essential.

Use Grok parsing to extract fields from logs | Datadog Tips & Tricks

When your logs don’t follow a standard format, it can be difficult to extract valuable information, like key-value pairs and nested JSON objects. Grok parsing lets you define flexible patterns that match unstructured log data so you can extract specific fields to query, filter, and visualize. In this video, you’ll learn how to: By refining your Grok parsers, you can make your logs more useful for analytics, dashboards, or alerts, and get even more value from your logs.

Detecting an AWS Outage and DR Lessons

A few weeks ago, on 20th October 2025, AWS suffered a widespread outage in its US-EAST-1 region that affected a large number of customers globally. More than 1,000 apps and websites were impacted including major banks and popular games, streaming and social platforms such as WhatsApp, Snapchat, Fortnite and Pokémon Go.

What is APM? Understanding application performance monitoring

The rapid advancement of technology has revolutionised the way businesses operate and engage with their customers. A delay of even a few seconds can lead to significant drop-offs in engagement and conversions. According to Google's findings, "just a 100-millisecond lag can reduce revenue by 1%, and a half-second delay can cause a 20% drop in search engine traffic".

How OpenTelemetry Is Redefining Application Performance Monitoring

The data is there, but it’s scattered across domains, formats, and vendors. Teams are often left piecing together an incomplete story of what went wrong, long after the damage has been done. Now, a new open standard is changing that. OpenTelemetry (OTel) is fast becoming the connective tissue of modern observability—an open-source framework designed to make telemetry data (metrics, logs, and traces) universally accessible.

Bits AI SRE, Flex Frozen, and GPU Monitoring | DASH 2025

Get a first look at Datadog’s biggest product reveals from DASH 2025. Meet Bits AI SRE, your 24/7 autonomous AI Site Reliability Engineer, Flex Frozen for up to 7 years of managed log retention, and GPU Monitoring for full visibility into your AI workloads. Experience the future of observability in action.

Top 10 APM Tools [2026 Guide]

In 2026, application performance isn’t just a technical metric—it’s a business-critical factor. As organizations move deeper into cloud-native architectures, distributed systems, and AI-driven workflows, ensuring speed, reliability, and uptime has become non-negotiable. According to Gartner, by 2026 more than 70% of new APM implementations will be cloud-native, and businesses that leverage advanced observability platforms are expected to reduce downtime by up to 60%.

Triaging an Incident with a Critical Data Pipeline at #rivian

Rivian makes electric vehicles to advance its mission to keep the world adventurous forever. As software defined vehicles, Rivian’s R1T and R1S are connected to the cloud from day 1, and telemetry data is at the heart of enabling mobile notifications, remote diagnostics, fleet management, and more. With so many critical pipelines in the cloud, observability is a top priority for the data platform.
Sponsored Post

Transform your workflow with Raygun's remote MCP

We're happy to announce Raygun's new remote MCP server, giving AI tools direct access to live error data so they can investigate issues, surface root causes, and take action with real context, not guesses. It's been nearly a year since Anthropic released the Model Context Protocol (MCP), and a lot has changed in the AI space. Since then, almost all major players now support MCP, allowing them to tap into the massive and ever-expanding catalogue of MCP servers. When MCP first launched, we shipped our own Raygun MCP within 48 hours of the spec dropping, which was an early step toward giving LLMs visibility into Raygun data.