Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Resilience in Action E8: Vanessa Yiu on Crafting Enterprise Architecture

‍Resilience in Action is a podcast about all things resilience, from SRE to software engineering, to how it affects our personal lives, and more. Resilience in Action is hosted by Kurt Andersen. Kurt is a practitioner and an active thought leader in the SRE community. He speaks at major DevOps & SRE conferences and publishes his work through O'Reilly in quintessential SRE books such as Seeking SRE, What is SRE?, and 97 Things Every SRE Should Know.

PagerDuty Summit21 Keynote: DigitalOps Now: Go Digital First with Modern Digital Ops Management

To succeed in a world of digital first customer experiences, operations must also be digital first. Join PagerDuty CEO Jennifer Tejada & CPO Sean Scott as they share the latest PagerDuty innovations and our vision for the future of work. Don't miss exclusive fireside chats with Fox Corporation executives Paul Cheesbrough, CTO & President of Digital and Jeff Dow, EVP for Media and Broadcast, as well as Kim Hammonds, Investor and Board Member at Zoom, Box, Tenable and UiPath and The Goldman Sachs Group, Inc.

Leverage Observability With OpenTelemetry to Understand Root Cause Quickly

An observability solution should help any incident responder understand what changed and why. A lot has been written on the difference between monitoring and observability, but an easy way to understand how both are integral to incident response is to consider how customers use PagerDuty—with both monitoring and observability tools—to get to the right answer.

SREview Issue #14 June 2021

Hoping you're headed towards a fun summer season and some time without masks. Let's avoid a new kind of tan-line! This newsletter shares useful industry content and an exciting Blameless product announcement. Find our fave tweets and events in the SRE and resilience engineering community. We're hiring! Check out the job openings here.

xMatters Makes Workflow Automation as Simple as Drag and Drop

xMatters’ low to no code integrations makes creating automated workflows that align your team and processes as simple as drag and drop. With just a few clicks, your teams can be building workflows that integrate, automate and accelerate your incident response and resolution capabilities. Best yet, xMatters is free to use and you can get started today at xmatters.com/free!

Red Canary says 43% Lack Readiness to Notify Customers of a Security Breach

The phrase ‘stakeholder management” assumes that stakeholders are truly informed by alerts. However, managers can only send communications out, they cannot force people to address them. To ensure your stakeholders are engaged during an incident, it is vital to set up a defined communication process. Yet, a recent Red Canary report1 found that 43% of surveyed participants lack readiness to notify the public and/or its customers in the event of a security breach.

Everything You Need to Know About Emergency Risk Management

Emergency risk management (ERM) is the process of identifying potential threats and minimizing the impact of disasters on business operations and people. The process requires leaders within an organization to determine how they will keep stakeholders informed and safe during critical events. Leaders must also craft disaster recovery plans to quickly remedy the effects of a catastrophic event on communities, government agencies and organizations.