%term

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

PagerDuty x Backstage Plugin Demo: Eliminate Context Switching for On-Call Engineers

Feb 9, 2026 By PagerDuty Inc. In PagerDuty

Join Rocío, Product Manager of the Forward Deploying Engineering team at PagerDuty, as she demonstrates how the PagerDuty Backstage plugin transforms incident response by bringing critical operational data directly into your developer portal.

View Video

PagerDuty

Incident Management

Read more about PagerDuty x Backstage Plugin Demo: Eliminate Context Switching for On-Call Engineers

SIGNL4 February Release - SCIM, Caller ID, Team Admin Invites

Feb 9, 2026 By SIGNL4 In SIGNL4

We’re excited to share SIGNL4’s first product update of 2026! Automate user onboarding and offboarding with SCIM, control whether Team Admins can invite new users, and choose the caller ID used for call routing.

Read Post

SIGNL4

Read more about SIGNL4 February Release - SCIM, Caller ID, Team Admin Invites

Reference architecture: The blueprint for safe and scalable autonomy in SRE and DevOps

Feb 9, 2026 By Leah Wessels In iLert

Everyone wants autonomous incident response. Most teams are building it wrong. ‍ The ultimate goal of autonomy in SRE and DevOps is the capacity of a system to not only detect incidents but to resolve them independently through intelligent self-regulation. However, true autonomy isn't born from automating random, isolated tasks. It requires a stable foundation: a Reference Architecture.

Read Post

iLert

Read more about Reference architecture: The blueprint for safe and scalable autonomy in SRE and DevOps

Silent Failure in Production ML: Why the Most Dangerous Model Bugs don't Throw Errors

Feb 9, 2026 By Ritika Bramhe In OnPage

You’ve done it. Your machine learning model is live in production. It’s serving predictions, powering features, and quietly doing its job. Dashboards are green. There are no errors in the logs. Nothing appears broken. And yet, something is wrong. Predictions are getting less reliable. Users are waiting a little longer for responses. Conversion rates are slipping. Trust is eroding, but no alert fires, no system crashes, and no one knows there’s a problem until the damage has been done.

Read Post

OnPage

Read more about Silent Failure in Production ML: Why the Most Dangerous Model Bugs don't Throw Errors

Weekly vs. split-week on-call rotations: A guide to finding the right rhythm

Feb 6, 2026 By Sreekar In Spike

When you move past daily rotations but find anything longer than a week feels too stretched out, you often end up choosing between weekly and split-week rotations. Weekly rotations give you a full seven days before handing off. Split-week rotations break that time into smaller chunks like 2-day, 3-day, or 4-day shifts. Each approach creates a different rhythm for your team. This guide compares both patterns across three key criteria.

Read Post

Spike

Read more about Weekly vs. split-week on-call rotations: A guide to finding the right rhythm

PagerDuty + OOPS Meetup: AI in Incident Management

Feb 6, 2026 By PagerDuty Inc. In PagerDuty

AI is transforming industries at pace, and Incident Response is no exception - raising important questions about how humans and automation should work together when systems are failing and pressure is highest. Panelists:Andrew White (Technology Director, checkout.com) James Pickles (Senior Solutions Consultant, PagerDuty)Sarah Wells (Independent Consultant, former Technology Director at FT) Suraj Singh Dadwal (Team Lead, Incident & Problem Management, IG)

View Video

PagerDuty

Incident Management

Read more about PagerDuty + OOPS Meetup: AI in Incident Management

Event Intelligence Solutions Part Three: Best Practices for Successful Adoption

Feb 6, 2026 By david.arrowsmith In Interlink

As Event Intelligence Solutions (EIS) move from early adoption to operational necessity, many enterprises are realizing that success depends on more than selecting the right technology. For Banking and Financial Services organizations, effective adoption requires a clear strategy, disciplined execution and a strong alignment to business priorities and regulatory demands and not least, customer expectations.

Read Post

Interlink

Read more about Event Intelligence Solutions Part Three: Best Practices for Successful Adoption

AI Incident Assistant: Automating major incident management

Feb 5, 2026 By BigPanda In BigPanda

This demo of AI Incident Assistant shows the agentic AI capabilities that help streamline collaboration, investigate smarter, and automate resolution for major incident management teams.

View Video

BigPanda

Read more about AI Incident Assistant: Automating major incident management

Transform IT major incident management with customizable AI Workflows from BigPanda

Feb 5, 2026 By Rachel Pearson In BigPanda

Enterprise Management Associates found that major IT service outages are increasing in cost, frequency, and duration, with unplanned downtime costing large enterprises nearly $25,000 per minute, or $1.5 million per hour. When every minute costs $25,000, you can’t afford to waste engineering time on coordination tasks like creating channels, paging experts, typing summaries, and posting updates. An agentic AI-powered incident assistant can eliminate that waste and reduce bridge call costs.

Read Post

BigPanda

Read more about Transform IT major incident management with customizable AI Workflows from BigPanda

2-day vs. 4-day on-call rotations: Which one fits your team

Feb 4, 2026 By Sreekar In Spike

Teams that find a weekly rotation too long and a daily rotation too short often end up choosing between 2-day and 4-day rotations. This guide compares both these rotations across three key criteria. For each criterion, we have discussed how it works for 2-day and 4-day rotations and recommended what to choose when. To make it easy, we also included a comparison table for a quick overview. This gives you all the information you need at a glance. Let’s dive in! Table of contents.

Read Post