%term

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

What does using AI for post-mortems actually mean?

Apr 23, 2026 By Article In Incident.io

Everyone is using AI to help with post-mortems now. The pitch is obvious: post-mortems are time-consuming, the blank page is brutal, and AI is very good at producing structured, confident-sounding documents quickly. We're not here to push back on that. We've built AI into our own post-mortem experience, pulling your Slack thread, timeline, PRs, and custom fields together and giving your team a meaningful starting point in seconds. We think that's genuinely valuable, and the teams using it agree.

Read Post

Incident.io

Read more about What does using AI for post-mortems actually mean?

How it feels to run an incident with AI SRE

Apr 23, 2026 By Article In Incident.io

We've been building the broader incident.io platform for several years now, and one thing we've learned is that UX matters more here than almost anywhere else. When an incident fires, there's no room for poorly designed interfaces or fumbling through features you haven't touched in a while. The product has to be ergonomic: easy to pick up, easy to navigate, with the right things at your fingertips at exactly the right moment. We've put a lot of effort into this over the last 5 years.

Read Post

Incident.io

Read more about How it feels to run an incident with AI SRE

AWS Outage History: The Biggest AWS Downtime Events from 2021 to 2025

Apr 22, 2026 By StatusGator In StatusGator

The AWS outage history from 2021 to 2025. Explore major AWS downtime events, including those that were not officially acknowledged, outage timelines, and reports, plus how to monitor cloud status.

Read Post

StatusGator

Read more about AWS Outage History: The Biggest AWS Downtime Events from 2021 to 2025

From Static Response to Dynamically Adaptive Resilience

Apr 20, 2026 By Jon Skog In xMatters

Organizations face an overwhelming mix of digital disruptions: service outages, security incidents, infrastructure failures, all happening faster and with greater complexity than ever before. At the same time, expectations have changed. It’s no longer enough to detect issues quickly or simply notify the right people. The real challenge is what happens next. How do you move from signal to action fast enough, coordinated enough, and with the right decisions at every step?

Read Post

xMatters

Read more about From Static Response to Dynamically Adaptive Resilience

How to Set Up Custom Webhook Alert Rules in PagerTree (Create on DOWN, Resolve on UP) YAML Tutorial

Apr 20, 2026 By PagerTree In PagerTree

Custom PagerTree webhook YAML rules tutorial: Automatically create alerts on DOWN status webhooks and resolve on UP—using MonitorID for deduplication.

View Video

PagerTree

Read more about How to Set Up Custom Webhook Alert Rules in PagerTree (Create on DOWN, Resolve on UP) YAML Tutorial

The Shift from Reactive to Proactive Incident Management: What AI Actually Makes Possible

Apr 17, 2026 By AlertOps In AlertOps

Why enterprise operations teams stop chasing incidents and start preventing them Most enterprise operations teams are faster than they were three years ago. Alert routing is automated. On-call schedules are managed through platforms rather than spreadsheets. MTTR has come down as tooling has improved. On the metrics that measure reactive performance, progress is visible. What has not meaningfully changed is the rate at which the same incidents recur.

Read Post

AlertOps

Read more about The Shift from Reactive to Proactive Incident Management: What AI Actually Makes Possible

Choosing an AI-Driven Observability Platform for Complex Enterprise IT

Apr 17, 2026 By david.arrowsmith In Interlink

Selecting the right observability platform has become a strategic priority for enterprises operating at scale.

Read Post

Interlink

Read more about Choosing an AI-Driven Observability Platform for Complex Enterprise IT

MCP Apps: On Call Compensation Report and Service Dependency Graph

Apr 17, 2026 By PagerDuty Inc. In PagerDuty

This April, PagerDuty's MCP server expands with powerful new capabilities across Analytics & Reporting and Business Services. Teams can now surface aggregate incident data, service metrics, and team metrics — giving operators instant access to the operational insights that matter most. On the Business Services side, the release adds business service dependencies, subscriber management, impacted services analysis, and priority mapping. Rounding out the release are two new MCP Apps (on our experimental branch): Service Dependency graph. and an On-call Compensation report.

View Video

PagerDuty

Read more about MCP Apps: On Call Compensation Report and Service Dependency Graph

In the Age of AI, Taste Isn't About Aesthetics

Apr 16, 2026 By Rootly In Rootly

AI can generate a UI in seconds. So what do designers actually bring to the table? Marcela, Principal Product Designer at Rootly and former Founding Designer at Ramp, has spent 20 years in design. Her answer: taste isn't about aesthetics or crafting pleasant interactions. It's about asking the uncomfortable questions, and choosing the right problem, not the easiest one.

View Video

Rootly

Read more about In the Age of AI, Taste Isn't About Aesthetics

PagerDuty Invests in the AI-First Operations and Resilience of Healthcare and Crisis Response Organizations

Apr 16, 2026 By Debbie O'Brien In PagerDuty

At PagerDuty, we believe operational excellence and social impact are inseparable. As AI rapidly transforms how nonprofits operate, our AI and agentic technology empower mission-driven teams to automate complexity and focus their limited resources on what matters most: delivering reliable services that create meaningful impact at scale.

Read Post