Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

ITIL, ITSM and incident management. What are they and how do they fit together?

You’ve probably heard the terms ITIL and ITSM, but the distinction between the two can be a little unclear. Throw incident management into the mix, and the whole thing can feel pretty confusing. This article aims to explain what they are, the differences between the three, and importantly how they fit together. First, let’s establish what each of the terms actually mean.

The modern incident management software stack

We’re fortunate enough to speak to a huge number of companies about their incident management processes. In doing so, we’ve noticed an emergent trend in how modern companies are using software to support their incident management processes, and a common set of challenges faced by them too.

SaC - How to build status pages as code with Terraform

Status pages are a clever solution to bundle all your services, and see the status of them at one sight. We at iLert took this one step further: why not build your status page as code using Terraform? We want to show you how we make it possible, and how you can set it up for your own infrastructure - a real SaC solution.

What Metrics and KPIs Really Matter in Availability?

In our inaugural State of Availability Report, we discovered that not only do metrics matter but the way we use them also does. Our research found that teams with fewer KPIs were more likely to meet their Service Level Agreements (SLAs) and provide their customers with higher levels of availability. The problem with having too many KPIs is that they cause information overload and noise.

A Guide to Incident Severity Levels

Maintaining IT infrastructure is a consistent challenge for system administrators, site reliability engineers (SREs), supporting developers, and technicians. Several factors can impact system performance, cause outages, or impact customer experience. On top of that, not all incidents are created equal. The impacts and severity of a system outage affecting 10% of your users are different from an outage impacting 90%.

PagerDuty Named a G2 Leader for Enterprise Incident Management Software

With the announcement of their Fall 22’ Review awards, PagerDuty has been named a G2 Leader for Incident Management Software for the sixth quarter in a row. We owe a special thank you to our customers who have consistently given PagerDuty high satisfaction scores that take into account their likelihood to recommend PagerDuty, our ability to meet their requirements, and the overall ease they’ve found in doing business with us.

Monthly Moo | October 2022

Summer has passed and it’s time for fall - cue transitioning leaves, cozy blankets, and all the pumpkin-themed things your heart could ever desire. As we move into the new season, we are excited to announce our fall product releases across Moogsoft Cloud that enable engineers to detect incidents earlier, resolve them faster, and work as a team across the entire lifecycle. Moogsoft’s Fall product updates enable you to: … and so much more! Read on for deeper details.