%term

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

DNS Outages Expose Hidden Risks. Edwin AI Finds Them Faster.

Oct 24, 2025 By Margo Poda In LogicMonitor

The recent AWS outage exposed how fragile the internet remains. Amazon traced the hours-long disruption to a DNS error—a small failure with massive reach. For most organizations, DNS operates quietly in the background. When it fails, every digital service connected to it stops. One of LogicMonitor’s valued customers, IG Group, faced a similar event less than ten hours after enabling Edwin AI.

Read Post

LogicMonitor

Read more about DNS Outages Expose Hidden Risks. Edwin AI Finds Them Faster.

Demo Roundups! What's New in Schedules: Flexible Shifts + AI Conflict Resolution

Oct 24, 2025 By PagerDuty Inc. In PagerDuty

Manual scheduling and on-call gaps cost your team sleep and sanity. Join us for a demo of PagerDuty's latest schedule experience improvements. From iCal-compatible shift management to AI-powered conflict resolution, see firsthand how to build bulletproof on-call coverage with minimal operational overhead.

View Video

PagerDuty

Read more about Demo Roundups! What's New in Schedules: Flexible Shifts + AI Conflict Resolution

What Is Business Continuity?

Oct 23, 2025 By Randhir Kumar In Spike

A single outage can stop operations, affect customers, and impact trust. In a world of pandemics, cyberattacks, weather events, and supply chain delays, your team cannot pray that something does not break. Business continuity drives your team to stay ready, recover earlier, and keep downtime lower. In this blog, we’ll explain what business continuity means, how to create a solid business continuity plan, and which approaches help teams keep operational during a disruption event.

Read Post

Spike

Read more about What Is Business Continuity?

What Is Incident Response Lifecycle?

Oct 23, 2025 By sachin In Spike

The Incident Response Lifecycle is a step-by-step process that helps engineering teams detect, respond to, and recover from unexpected system disruptions or outages. It includes a series of six practical stages: Detection, Analysis, Impact Mitigation, Incident Resolution, Service Restoration, and Post-Incident Analysis. By following this lifecycle, teams can minimize downtime, reduce business impact, and continuously strengthen system reliability.

Read Post

Spike

Read more about What Is Incident Response Lifecycle?

How to manage ilert call flows via Terraform

Oct 23, 2025 By ilert In iLert

Call flows let you design voice workflows with nodes like “Audio message,” “Support hours,” “Voicemail,” “Route call,” and much more. The ilert Terraform provider now includes a ilert_call_flow resource so you can version and promote these flows across environments. This blog post offers an overview of managing call flows in Terraform, detailing the benefits and key scenarios.

Read Post

iLert

Read more about How to manage ilert call flows via Terraform

The Burn Down: October 2025

Oct 23, 2025 By FireHydrant In FireHydrant

All the latest updates from FireHydrant, including more powerful ways to use AI and Incident Management upgrades.

View Video

FireHydrant

Read more about The Burn Down: October 2025

Demo Roundups! What's New in Schedules: Flexible Shifts + AI Conflict Resolution

Oct 23, 2025 By PagerDuty Inc. In PagerDuty

Manual scheduling and on-call gaps cost your team sleep and sanity. Join us for a demo of PagerDuty's latest schedule experience improvements. From iCal-compatible shift management to AI-powered conflict resolution, see firsthand how to build bulletproof on-call coverage with minimal operational overhead.

View Video

PagerDuty

Incident Management

Read more about Demo Roundups! What's New in Schedules: Flexible Shifts + AI Conflict Resolution

Meeting Developers Where They Work: PagerDuty + Spotify Portal for Backstage

Oct 22, 2025 By Shawn Haywood In PagerDuty

From the beginning, PagerDuty has been built by developers, for developers. Our mission has always been to help development teams build faster and resolve incidents more efficiently by meeting them where they work. Building on PagerDuty’s existing plugin for Spotify for Backstage, we are thrilled to announce the PagerDuty plugin for Spotify Portal for Backstage to continue bringing enterprise-grade incident management into even more developer workflows.

Read Post

PagerDuty

Read more about Meeting Developers Where They Work: PagerDuty + Spotify Portal for Backstage

Best MSP Tools of 2025

Oct 22, 2025 By Zoe Collins In OnPage

Managed service providers (MSPs) are strong multitaskers, handling monitoring, documentation, security, infrastructure maintenance, support, and more for each of their clients. So clearly the need for a strong set of MSP tools is one that cannot be overlooked. In the current state of IT, clients expect swift response and seamless service delivery no matter the time of day, meaning, MSPs must invest in a toolkit that will enable them to deliver high-quality service 24/7.

Read Post

OnPage

Read more about Best MSP Tools of 2025

Service disruption on October 20, 2025

Oct 22, 2025 By Article In Incident.io

When the internet goes down, our primary job is to help everyone get back up, as fast as possible. Of the almost half a million incidents we've helped our customers solve, there are some which stand out for both their scale and impact. One of these happened on Monday, October 20, when AWS had a widely covered major outage in their us-east-1 region, from 07:11 to 10:53 UTC. We’re hosted in multiple regions of Google Cloud and so the majority of our product was unaffected by the outage.

Read Post