Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

The Incident Commander Role: Duties & Best Practices for ICs

Imagine that a critical incident — a major outage, cyberattack or disaster — occurs out of nowhere in your company. In such a case, you'll try to minimize the damage and get back to normal operations as quickly as possible. But how will you do that? You've no idea how to manage such incidents. This is where incident commanders come in. They're trained professionals who lead the response to critical incidents.

Fast track video series: Slash IT noise by up to 98% with Alert Correlation with BigPanda

The average organization can have ten or more monitoring or observability tools in their IT stack. These tools keep generating an overwhelming amount of noise. IT Ops, NOC and DevOps teams drown in this noise and can’t focus on real incidents until it’s too late. Your organization’s alerts don’t have to turn into an untameable tsunami with no end in sight—there’s a better way forward.

What Does IT Maturity Even Mean?

Seriously… What are people trying to say by “Your approach to IT Operations needs to mature”? Fair question. Billions of dollars are spent every year on software solutions to help IT organizations operate more efficiently. How could it be that with all that investment, we’re still not netting enough efficiency gains? The truth is, our technology landscape has evolved, our operational models have evolved, we have evolved.

Callable Flows - xMatters Support

In xMatters Flow Designer, you can use callable flows to initiate a major incident process in any workflow. Instead of including the same sequence of steps in each workflow, such as posting to a status page or opening a help desk ticket, you can build the sequence once as a separate workflow and then include that as a step in any of your workflows.

How ITOps teams are coping with the evolution of cloud management

Breaking down cloud management platforms and hybrid/multicloud management In our recent Whiskey and Wisdom session, we discussed how ITOps teams are coping with the evolution of cloud management. Whiskey and Wisdom is a monthly executive-only forum where IT operations leaders can network independently and discuss high-level AI operations and ITOps strategies with their industry peers.

Signals Report -xMatters Support

The Signals report helps you evaluate signals to your xMatters instance from HTTP, App, Email, and Incident Initiation and Incident Automation triggers (as well as some legacy inbound integrations). The report displays the timestamp, status code, and authentication details for each signal, as well as the payload and any related incidents, where applicable. Processed signals include outputs from the trigger and a link to the associated workflow so developers can further evaluate each request using Flow Designer's Activity panel.

Calculating Business Value of Automation in PagerDuty Process Automation

Budgets in IT departments are tight these days, so proving a return on investment is essential for justifying or expanding a project. The good news is that automation saves money by reducing the amount of human effort required. It is similar to investing in a robot vacuum cleaner. Despite the upfront cost, you save time (and money) by not having humans do the vacuuming. Reporting the value delivered by an automation program can be challenging since the value depends heavily on what is being automated.

How Synthetic Transaction Monitoring Provides Complete Site Visibility & Why Basic Monitoring is Not Enough

We’ve all been in the situation before: it’s Friday at 5 PM and the only on-call engineer available to handle incidents is about to hit the slopes. Unfortunately, at that very moment, a customer reports to support that they are unable to access the company’s ecommerce website to complete a purchase. Internal monitoring systems seem quiet and services appear available on internal health dashboards.

8 Incident Management Tools You Need To Consider In 2023

You're probably aware that downtime is expensive—but do you know how expensive it is? The short answer is—very. According to the Ponemon Institute, outages cost organizations an average of $9,000 per minute (or $540,000 per hour). That's why companies of all sizes are investing in incident management tools to reduce their downtime and improve the customer experience.