Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

What does using AI for post-mortems actually mean?

Everyone is using AI to help with post-mortems now. The pitch is obvious: post-mortems are time-consuming, the blank page is brutal, and AI is very good at producing structured, confident-sounding documents quickly. We're not here to push back on that. We've built AI into our own post-mortem experience, pulling your Slack thread, timeline, PRs, and custom fields together and giving your team a meaningful starting point in seconds. We think that's genuinely valuable, and the teams using it agree.

How it feels to run an incident with AI SRE

We've been building the broader incident.io platform for several years now, and one thing we've learned is that UX matters more here than almost anywhere else. When an incident fires, there's no room for poorly designed interfaces or fumbling through features you haven't touched in a while. The product has to be ergonomic: easy to pick up, easy to navigate, with the right things at your fingertips at exactly the right moment. We've put a lot of effort into this over the last 5 years.

From Static Response to Dynamically Adaptive Resilience

Organizations face an overwhelming mix of digital disruptions: service outages, security incidents, infrastructure failures, all happening faster and with greater complexity than ever before. At the same time, expectations have changed. It’s no longer enough to detect issues quickly or simply notify the right people. The real challenge is what happens next. How do you move from signal to action fast enough, coordinated enough, and with the right decisions at every step?

The Shift from Reactive to Proactive Incident Management: What AI Actually Makes Possible

Why enterprise operations teams stop chasing incidents and start preventing them Most enterprise operations teams are faster than they were three years ago. Alert routing is automated. On-call schedules are managed through platforms rather than spreadsheets. MTTR has come down as tooling has improved. On the metrics that measure reactive performance, progress is visible. What has not meaningfully changed is the rate at which the same incidents recur.

MCP Apps: On Call Compensation Report and Service Dependency Graph

This April, PagerDuty's MCP server expands with powerful new capabilities across Analytics & Reporting and Business Services. Teams can now surface aggregate incident data, service metrics, and team metrics — giving operators instant access to the operational insights that matter most. On the Business Services side, the release adds business service dependencies, subscriber management, impacted services analysis, and priority mapping. Rounding out the release are two new MCP Apps (on our experimental branch): Service Dependency graph. and an On-call Compensation report.

Why post-mortem action items die

You can run the best debrief of your life. Honest timeline, blameless tone, real insights. People leave the room nodding. And then nothing happens. This is the last mile problem of post-mortems - and it's an easy trap to fall into. When you've just been through a stressful incident, getting it back up is the priority. Once it's over, the post-mortem itself can feel like the finish line. You've documented what happened, been honest about it, identified what went wrong. It feels like the work is done.

In the Age of AI, Taste Isn't About Aesthetics

AI can generate a UI in seconds. So what do designers actually bring to the table? Marcela, Principal Product Designer at Rootly and former Founding Designer at Ramp, has spent 20 years in design. Her answer: taste isn't about aesthetics or crafting pleasant interactions. It's about asking the uncomfortable questions, and choosing the right problem, not the easiest one.