Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

SecOps for the Cloud: PagerDuty and AWS Security Hub

This week at re:Inforce in Boston, the AWS team showed off its Security Hub service—a powerful service that provides SecOps teams a comprehensive view of their high-priority security alerts and compliance status across their AWS accounts. We’re excited to join AWS at re:Inforce this week as a Security Hub partner, where we’ll show users how PagerDuty and AWS Security Hub work together to provide real-time SecOps to any team using AWS.

Listen to a Recorded Incident Response Call

The PagerDuty Incident Response Process is a detailed document that provides a framework for how to structure your incident response process. But sometimes it helps to understand how these seemingly abstract concepts play out during real-world scenarios. You can now hear an incident call recording that’s based on a real PagerDuty incident. Due to the nature of incident response practices, the process guide we publish is filled with very explicit details regarding a variety of situations.

How Does Google Handle Critical Incidents?

While there are some very good sources out there on how to manage a critical incident, Google also wrote a chapter on incident management in their book, “Site Reliability Engineering”. In this chapter, the folks at Google present their approach to a well-designed critical incident management process.

June 2019 Release Overview: Work In Real Time, All The Time, Wherever You Are

This month, we are excited to announce a new set of product capabilities and enhancements designed to ensure that teams can work in real time, all the time, wherever they are. Whether they’re on-the-go with their mobile devices or at their desks on a typical work day, we will continue to innovate without sacrificing ease-of-use and adoption.

OnPage and ConnectWise: Incident Alert Management Workflows

Let’s set the scene: You’re an on-call engineer, working for a dedicated support team. Your priorities are twofold, including, (1) speedy incident resolution and (2) satisfying clients and stakeholders. With these demands in mind, you adopt OnPage’s integration with ConnectWise. The integration streamlines the ticketing-to-alerting process, ensuring that your team achieves client service excellence.

LaborDuty: Incident Response For Baby's Arrival

Real-time operations is a term PagerDuty uses to describe the process in which people can acknowledge, communicate, resolve, and learn from impactful events—all in real time. What can be a more impactful and real time than the miracle of childbirth? Whether it’s your first or fifth child, things don’t always go as planned, but the experience also generally comes with a good story filled with hindsight.