Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Shopify Cyber Monday outage - December 1, 2025

On December 1, 2025, Cyber Monday, the biggest online shopping day of the year, Shopify suffered a widespread outage that left many merchants unable to access their stores or process orders. At a time when every minute of uptime translates directly into revenue, the disruption caused immediate concern across the ecommerce community. StatusGator detected the issue within minutes, sending an Early Warning Signal 10 minutes before Shopify published its official acknowledgement.

Introducing a More Flexible On-Call Schedule

Today, we are introducing some new on-call features: Add Gaps to on-call, Scheduled Layers, Handoff Days, and more. Flexibility in on-call schedules has been the single focus point in this release. These features give you much finer control over when people are on-call, how handoffs work, and what your schedule looks like around holidays and time off.

AI agents just got smarter thanks to PagerDuty + AWS

We are on the ground with AWS and announcing innovations that give customers more powerful AI agents for incident management. These new and improved integrations bring PagerDuty context into the AWS ecosystem for faster resolution and more connected data across the business. And, with our new competency, we take this a step further by codifying these best practices into our joint customers’ day-to-day operations. Announced today, here are some of the highlights.

OnPage Introduces Multi-Language Mobile App Localization on iOS & Android

As organizations continue to adopt OnPage across regions and operational environments, providing an experience that feels natural and intuitive for every user has become increasingly important. Clear communication is essential in time-sensitive workflows, and being able to use the app in one’s preferred language supports clarity, confidence, and consistency. To support our growing global user base, OnPage is introducing multi-language localization across its mobile applications.

How ilert's holidays and support hours keep teams sane

The end of the year brings pressure. (Oh, we know!) Customer demand spikes, response expectations stay high, and engineering teams are juggling production issues, releases, and time off. For many teams, this is when on-call becomes chaotic: schedules break, notifications hit at the wrong time, and coverage gaps appear exactly when you can’t afford them. ‍ ilert's Holidays and Support hours features were built to fix that.

AI Infrastructure Is Creating a New Wave of Incidents, And Why Enterprises Need a Modern On-Call Strategy

Over the last few years, AI has quietly shifted from a fascinating experiment to a core operational system. Enterprises aren’t just building prototypes anymore — they’re deploying LLMs into production environments where uptime directly affects customer interactions, revenue flows, and business continuity. AI has essentially become a new layer of critical infrastructure. Because of that shift, the definition of “reliability” is changing.

Stop choosing between fast incident response and secure access

Every production system will eventually break. It's not pessimism, it's just reality. That's why engineers go on-call, and why companies invest heavily in incident response tooling. But here's the problem: the moment an engineer goes on call, they typically need elevated access to production systems, databases, and sensitive customer data. And that elevated access? It's often permanent, overly broad, and a security nightmare waiting to happen.

Incident Postmortem: How to Learn From Failures and Build Reliable Systems

When the issue settles, and systems are back, one question always remains: What actually happened, and how do we stop it from happening again? That’s where incident postmortems come in. Not just as documentation, but as a structured way to learn, improve reliability, and replace guessing with clarity. A good postmortem isn’t about blame, heroics, or perfect narratives. It’s about truth, learning, and building systems that get stronger with every failure.