Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Situation Room: On-Call Team Faces Worst Case of Sunday Scaries

Picture this: it’s Sunday night. You’re relaxing in bed, in that sweet spot where you’re geared up for Monday, but the fun of the weekend hasn’t yet faded. As you idly scroll through content on your phone, you see a message preview pop up. It’s to your work email. That’s bad. It’s from the hosting company you contract. That’s really bad. They’re saying they accidentally deleted the production database. That’s “jump out of bed” bad.

How to Structure an IT Help Desk

Managed service providers (MSPs) need an IT help desk to address and answer the technical questions of clients. In the modern MSP environment, the IT help desk is the primary source of contact between customers and knowledgeable, responsive support personnel. Successful help desks are customer oriented and encourage clients to report IT incidents when they occur.

Has the firefighting stopped? The effect of COVID-19 on on-call engineers

With digital becoming the primary channel for work, education, shopping, and entertainment in the last 18 months, it’s no surprise that workloads for technical teams and on-call engineers have increased. Data from PagerDuty’s inaugural platform insights report, The State of Digital Operations, highlights this reality. As of July 2021, the average number of events managed daily by PagerDuty is 37 million, with 61,000 of those being critical incidents.

New feature: Templates for Incident Management

At Spike.sh , we are obsessed with making incident management more accessible to dev teams everywhere. With this goal in mind, we are always looking for ways to reduce the friction while setting up the Spike.sh platform. When we saw customers asking our advice for creating effective on-call schedules and escalations, we knew we had to do more than just good documentation - we needed a way to share best practices with our customers in the product itself.

MTBF Is an Integral Part of Business Operations - Here's Why

In today’s fast-paced digital world, your customers expect your services to be available 24 hours a day, seven days a week. If your services are unreliable, these customers will likely take their business elsewhere — and spread the word. To retain their business, you must understand and optimize your service and system health to ensure your services are reliable. Gauging your service and system health requires much more than knowing whether they’re on or off.

What's new: Updates to Event Intelligence, mobile, and more!

As we near the end of the Summer season, we’re excited to announce a new set of updates and enhancements to the PagerDuty platform. These updates will help our users and customers: Make sure to view the latest PagerDuty Pulse or learn more from our community team and developer advocates who have launched new programs to help you learn more about our latest products and best practices.

Call Handling - Relieve the burden of your service desk and on-call staff

These days, I keep encountering inquiries from various customers on the topic of call handling. Due to the current transformation, triggered by the increased use of home offices, it is becoming more and more important to make on-call staff more accessible. Often the already overloaded service desk is used for this purpose. Of course, this leads to a) a deterioration in the quality of the service desk and b) delays between the receipt of the problem and the start of problem resolution.