Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Escalation policies for critical incidents

When a critical incident triggers, there’s no time to figure out who to call. That decision needs to be made well before the incident arrives. A dedicated escalation policy for critical incidents gives your team a clear path to follow the moment things go wrong, rather than leaving it to whoever happens to be around. This guide covers the key decisions involved in building that policy.

Understanding L1, L2, L3 escalation policy

L1, L2, L3 is one of the most common ways to structure an escalation policy. The idea is simple: an incident triggers and lands with a first responder. If it needs more attention, it moves up the chain to someone with more expertise. This guide explains how each tier works, when this structure makes sense, and what to keep in mind when setting one up.

From Passive Records to Active Care: Activating the EHR in Real time in Israel's hospitals

Israel’s healthcare system is widely recognized as one of the most digitally advanced in the world. Electronic health records are deeply embedded across hospitals, and platforms like Chameleon sit at the center of clinical operations. Patient data is captured, structured and accessible at nearly every stage of care delivery. But digital maturity alone does not guarantee operational efficiency.

The Definitive AWS Outage Report 2025: Reliability Analytics and Cascade Impact

Amazon Web Services remains one of the most popular cloud providers, with 200+ services in 39 regions across the world. Like all providers, they have their share of outages. In 2025, IncidentHub detected 38 AWS outages, of which the one on October 20th had the most widespread impact affecting hundreds of SaaS providers simultaneously. Payments were disrupted, students lost access to classrooms, developer tooling degraded, and some IT teams experienced alerting gaps.

SIGNL4 Among Germany's Best Software Companies

SIGNL4 has been recognized by G2 as one of the Best German Software Companies and we couldn’t be more excited. Matthes Derdack, Founder of SIGNL4, emphasizes:“This recognition matters because it’s not based on marketing claims – it’s based on what our customers experience in real operations. Teams running mission-critical infrastructure rely on SIGNL4 when things go wrong, not when everything is fine.

Escalation policies for low-priority incidents

Teams put a lot of thought into how critical incidents are handled. Low-priority incidents usually don’t get the same attention. And without a proper escalation policy, they just land in a shared channel, waiting for someone to acknowledge. Setting up a clear policy for them is worth doing. Not because they need the same urgency as a critical incident, but because having a defined path for every incident makes the whole system more reliable.

Keeping it boring: the incident.io technology stack

At incident.io we run a deliberately simple technology stack. Keeping things boring has allowed us to scale from a few hundred customers to several thousand, while having only two platform engineers. In this post I'll walk through the stack, explain some of the choices we've made, and touch on the challenges we're facing as we grow.

What is an escalation policy? (And why every team needs one)

An escalation policy is the route an incident takes after it triggers. It lays out who gets alerted first and sets a wait time. If nobody responds, it moves the incident forward to the next person. The word “escalation” is worth pausing on. When an incident triggers and the first person doesn’t respond, the incident doesn’t sit and wait. It moves to the next person and keeps moving until someone picks it up. That forward movement is the escalation.

A compass for designing your escalation policy

The first time you sit down to design an escalation policy, it can feel a little like a crossroads. You know incidents need to reach the right people. You just aren’t sure which structure makes the most sense. Should you route by severity? By who’s available? Or by team? There’s no single right answer. Think of this guide as a compass. A compass doesn’t tell you exactly where to go. It helps you orient yourself based on where you already are.

PagerDuty's Slack App Just Got a Whole Lot Better (And We're Just Getting Started)

If you’ve been eyeing chat-native incident tools and wondering whether PagerDuty can compete in Slack, this one’s for you. Are you still treating your incident management platform like a glorified pager? It’s time for an update. Over the past months, we’ve been evolving our Slack app from a notification tool into a full incident command center, and we’re coming for the chat-native tools (ahem, incident.io).