%term

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

SRE Agent Enhancements for Autonomous Triage

May 5, 2026 By PagerDuty Inc. In PagerDuty

Triage just got turbocharged with our latest PagerDuty SRE Agent enhancements!

View Video

PagerDuty

Read more about SRE Agent Enhancements for Autonomous Triage

Shift-Based Schedules

May 5, 2026 By PagerDuty Inc. In PagerDuty

With Shift-based Schedules (GA planned for May), scheduling with PagerDuty will be more flexible than ever! The new scheduling experience introduces quick start options, custom shifts, and multi-responder support for shadow training or increased coverage.

View Video

PagerDuty

Incident Management

Read more about Shift-Based Schedules

Post-Incident Reviews in the PagerDuty UI

May 5, 2026 By PagerDuty Inc. In PagerDuty

Turn incidents into learnings and build resilient operations with real-time collaboration and actionable insights built directly into your PagerDuty workflow. Post-incident Reviews in the PagerDuty UI are now in Early Access. Coming soon: AI-generated drafts and intelligent follow-up suggestions.#IncidentResponse.

View Video

PagerDuty

Incident Management

Read more about Post-Incident Reviews in the PagerDuty UI

BigPanda + ServiceNow: Autonomous IT Operations in Action

May 4, 2026 By BigPanda In BigPanda

Most AI tools recommend the next step. BigPanda's AI specialists take it — autonomously, inside ServiceNow and Now Assist. Watch how BigPanda and ServiceNow work together to detect, triage, investigate, and resolve incidents end-to-end.

View Video

BigPanda

Read more about BigPanda + ServiceNow: Autonomous IT Operations in Action

Faster incident investigation with BigPanda and ServiceNow Now Assist

May 4, 2026 By Travis Carlson In BigPanda

When an incident occurs, an L2/3 engineer or SRE can spend 20–30 minutes investigating across alert consoles, combing through change records, and pinging teams on Slack or Microsoft Teams. When you multiply that time spent across thousands of incidents per year by the cost of an IT outage at $14,056 per minute, the cost is staggering. Enterprises can’t afford to waste time searching across disparate tools.

Read Post

BigPanda

Read more about Faster incident investigation with BigPanda and ServiceNow Now Assist

A guide to setting up alerts for a new service

May 3, 2026 By Sreekar In Spike

When you launch a new service in production, you’re working with a lot of unknowns. You don’t yet know how it behaves under real traffic or which incidents are worth waking someone up for. That makes alerting for a new service a little different from what you’re used to with an established one. The goal in the early days isn’t to get everything perfectly configured. It’s to learn enough about the service to get your alerting right.

Read Post

Spike

Read more about A guide to setting up alerts for a new service

April 2026 Early Warning Signals

May 1, 2026 By Colin Bartlett In StatusGator

April saw widespread disruptions across SaaS platforms, developer tools, and cloud services, with login failures, pipeline issues, and general service outages among the most common problems. StatusGator’s Early Warning Signals consistently identified these incidents ahead of official provider updates. In several cases, the lead time was significant. Bitbucket pipeline failures were detected 1 hour 17 minutes before acknowledgment, while Claude performance issues surfaced 59 minutes early.

Read Post

StatusGator

Read more about April 2026 Early Warning Signals

Prevent outages with PagerDuty incident retrospectives

May 1, 2026 By PagerDuty In PagerDuty

Recurring incidents are a symptom of a broken process. Your teams are working hard to get services back online, but constantly battling the same problems is frustrating and not a sustainable approach. What’s reflected here is not a failure in engineering abilities, but a deficiency in the learning that should follow an incident. When incident analysis focuses on finding a single person or team to blame, it creates a culture of fear.

Read Post

PagerDuty

Read more about Prevent outages with PagerDuty incident retrospectives

GitHub Outages 2025 - 2026: Reliability Analysis and Outage History

Apr 30, 2026 By Hrishikesh Barua In IncidentHub

Hashicorp's co-founder Mitchell Hashimoto decided to pull out his Ghostty project from GitHub in April 2026 due to GitHub's reliability issues. He did this after 18 years of using GitHub, saying that GitHub "is no longer a place for serious work". GitHub has experienced a significant decline in reliability over the past 6 months, and Hashimoto is not alone in expressing this sentiment.

Read Post

IncidentHub

Read more about GitHub Outages 2025 - 2026: Reliability Analysis and Outage History

Four types of incident alerts every team should know

Apr 30, 2026 By Sreekar In Spike

Not every incident alert needs the same kind of response. One incident may need to wake someone up right away. Another may simply need to be picked up when the team starts work in the morning. Without a clear way to tell them apart, every incident feels equally urgent. That usually adds noise and makes incident response decisions harder than they need to be. This is where two questions help: In this guide, we’ll discuss what those questions mean and the four combinations that follow.

Read Post