Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Calculating MTTR: An Evolution Driven by the Rise of DevOps

The shift to cloud computing and the DevOps revolution have fueled some important changes in the way we think about software development and monitoring. It has delivered huge benefits to the companies that have fully embraced the approach. In fact, the DevOps Research and Assessment (DORA) 2018 industry survey found a new small group of “elite” performers that are deploying code far more often and having a far better mean time to resolution (MTTR) than the next closest group.

What Is MTTR? A Simple Definition That Will Help Your Team

Mean time to resolution (MTTR) is defined as the sum of the total amount of time that service was interrupted divided by the number of individual incidents. The unit of measurement is some quantity of time. Ideally, you can use minutes as the unit. That is, unless you blacked out the eastern seaboard for weeks!

HBO's "Chernobyl": Is there a lesson here for IT incident management?

I’m a big fan of historical TV dramas and last week I finished watching the stunning and shattering HBO TV miniseries about the 1986 Chernobyl disaster. As a monitoring expert and a product manager, I have visited dozens of IT operations centers, control rooms and NOCs, so I couldn’t help but compare them to the Chernobyl control room scenes in the show.

What's All the Fuss About Business Continuity Planning

Digital transformation has created more gateways for vulnerability and risk. So in addition to natural disasters that can impact a business, organizations are faced with cyberattacks that can truly cripple their business. A solid business continuity plan makes sure that your company is ready for whatever may come its way, be it fire, flood, critical technical failure, or a cyberattack.

How No-Code Integrations Help Incident Management Scale

Do you think no-code is just another buzzword that with no real meaning? Well, maybe it is in some contexts. But if you want an example of how no-code solutions can matter in the real world, look no further than the context of incident management. Let us explain by walking through what no-code solutions mean in the context of incident management, how they work and how they can help teams scale and streamline their operations.

2019 Hurricane Season: Solidify a Business Continuity Plan With a Mass Notification Solution

Summer is typically synonymous with beach days, outdoor barbecues and fulfilling weekend getaways. Unfortunately, the summer months aren’t only about enjoyable moments and exciting vacations. It’s also tropical storm season, with higher risks of destruction, community displacement and business operation disruption. With this potential for human and business peril, it’s important for organizations to implement a business continuity plan, equipped with a robust communication strategy.

Best Practices for Managing Multiple On-Call Teams

Alerting has come a long way from the days of paging an on-call administrator in the middle of the night, to multiple on-call teams that run and manage incident response around the clock. This is because as organizations grow and scale, responding to incidents also gets more complex and you often need more than one team to get involved to successfully resolve an incident.

Serverless Event-Driven Workflows with PagerDuty and Amazon EventBridge

This week’s AWS Summit in New York was an exciting one for both AWS and PagerDuty. The AWS team rolled out Amazon EventBridge, a set of APIs for AWS CloudWatch Events that makes it easy for AWS SaaS partners to inject events for their customers to process in AWS. PagerDuty is excited to continue and deepen our long partnership with AWS by supporting EventBridge as a launch partner.

No CMDB? No problem. Not for BigPanda.

I hear it all the time when talking to future BigPanda customers; “I’m not sure BigPanda can really help me correlate all these alerts together because our CMDB is very immature.” Or sometimes, they don’t even have a CMDB, and incorrectly assume this disqualifies them from meaningful noise reduction and alert correlation. I’m happy to tell you the same thing I tell the folks who are looking at BigPanda for the first time. “No CMDB? No problem!”.