Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

It's not ready for production until it has an Operational Readiness Checklist

Maintaining the reliability of complex services just got easier with Operational Readiness Checklists. Service owners and engineering leaders can now evaluate and maintain the production readiness of the services their users rely on every day: spot risks in your service dependencies before they cause incidents, and respond quickly if they do. Before you put a new service into production, readiness checklists help you dot-your-is and cross-your-ts.

12 ways to ace customer communications during a system outage

System outages are the worst nightmares for IT support teams, but they also provide an opportunity to stand out. During a major service outage, customers are often impacted a lot more because they have much less information about what is happening. Some of the biggest outages that affected users all over the world last year include those of Slack, PlayStation, Airbnb, FedEx, and Amazon.

The Math & Fun Behind Nesting Event Rules with Event Orchestration

PagerDuty Senior Product Manager Frank Emery joins us on Twitch to talk about Event Orchestration, a new feature in the PagerDuty Platform. We found in our data that 20% of incidents are resolved - by human responders - in under 5 minutes. Why are team members being interrupted for these alerts? Automation is a better answer. Event Orchestration utilizes powerful, flexible rules to turn alerts into automated activities so your team can keep working and avoid unnecessary interruptions!

SauceLabs & PagerDuty Notifications Channel for API Tests & Monitors

"APIs are the backbone of the apps and web services that run the world, yet most companies don’t have a true understanding of their functional uptime and reliability. Sauce Labs collects those insights by leveraging functional and integration tests as monitors. This provides a single source of truth for uptime and detailed reporting for when problems occur with functionality or performance. With PagerDuty, Sauce Labs' users gain granular control over notifications to ensure compliance with company policies while centralizing test and incident response processes among developers, testers, and product owners.

Integration Options with SIGNL4

SIGNL4 integrates with various backend systems like IT monitoring, service management, IoT systems, sensors, etc. to automatically alert users and teams about certain incidents. A list of selected tools along with integration descriptions is available in our integrations section. How can you integrate SIGNL4 with your own tools? In the following we list some options offering different levels of sophistication.

Squadcast Earns a Spot on G2's Top 50 Best Software Awards for IT Management Products 2022

We are thrilled to announce that G2 has recognized Squadcast as a High Performer in the Incident Management space and rated us as one of the Best Software for IT Management Products. Over the last three years, G2 has acknowledged our impact in the IT Incident Management space, which led to us being recognized as a Momentum Leader in the Incident Management and IT Alerting categories. Thanks to our learnings from customer feedback, we have been able to shape our product vision and grow further.

Three Common Incident Response Process Examples

What makes an engineering team? Communication, collaboration, process, order, and common goals. Otherwise, they would just be a bunch of engineers. The same is true of their tools. Connectivity and process turn a bunch of tools into a DevOps toolchain. If you need a DevOp toolchain, you can use it to easily build an incident response process.