Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

New enhancements to PagerDuty's SRE Agent: triage faster without waking a human

AI promise and AI capabilities often diverge, with developers often reporting much faster code production, but not enough change in how incidents are handled. When the rate of change is faster than ever, but the rate of recovery from incidents isn’t moving, developers wind up stuck in firefighting mode. And, when these systems fail, it’s costly. According to PagerDuty’s State of AI-First Operations, over a third of surveyed companies report losing $500K per hour of downtime.

Introducing Shift-Based Schedules: Smarter, Faster, and Easier for Any Team

This blog post is part of PagerDuty’s ongoing series on how we’re helping customers navigate their journey towards autonomous operations. Read on to learn about how PagerDuty’s Shift-Based Schedules (planned GA in May) builds towards this vision. PagerDuty has long been the gold standard for on-call management, helping thousands of teams build the foundations of digital reliability.

Activate Your Continuous Learning Flywheel With Post-Incident Reviews in PagerDuty UI

Earlier this year at our H1 2026 launch, we announced PagerDuty’s vision for autonomous operations: a future where AI agents learn from every incident, prevent failures before they happen, and progressively automate so teams can focus on innovation instead of firefighting.

Why Dedicated Incident Channels are the Modern Standard for Slack-Based Incident Response

Where do your teams go during a critical incident? For distributed teams, that war room is a channel in Slack or Microsoft Teams. The question is: are you creating a dedicated space for each incident, or are responders scrambling across DMs, email threads, and general channels trying to piece together what happened? The answer matters. Using dedicated incident channels has become the industry standard for high-performing incident response teams.

How to reduce alert noise without missing what matters

Reducing alert noise involves drawing a line between incidents that need an immediate response and ones that do not. Get this distinction wrong and your team is either interrupted unnecessarily or misses something critical. In this guide, we’ll help you make that distinction clear. We’ll cover what counts as noise and how to reduce it without missing what matters.

Inside the .de DNS Outage: Real-World Data from UptimeRobot.

In the evening of May 5th, 2026, large parts of the German web briefly went dark. For a few hours, anyone trying to load a.de address through a major DNS resolver got errors instead of websites. Bahn.de, Amazon.de, and Spiegel.de were among the affected. Major brands like Telekom, DHL, and Sparkassen felt it too, along with hosting providers Hetzner, Strato, and Ionos.

What is alert fatigue? (And how does it happen)

Alert fatigue doesn’t announce itself. It builds quietly over weeks and months until one day a critical incident triggers and nobody responds with the urgency it deserves. By that point, the damage is already done. This guide walks through what alert fatigue actually is, how it happens, and what you can do about it.

PagerDuty's Product Drop (May 2026)

PagerDuty’s monthly drops are here! May’s drop delivers innovation, helping teams work faster and smarter with four major updates: SRE Agent Enhancements: Triage just got turbocharged. New connectivity + new capabilities = faster resolution. Shift-Based Schedules (GA planned for May): Schedules are more flexible than ever. Quick start options, custom shifts, and multi-responder support for shadow training or increased coverage.