Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

7 ways teams are using incident.io's Decision Flows

One of my favourite features in incident.io is Decision Flows. With it, you can create a series of questions which eventually lead to a decision based on what you’ve answered. You can pull up this flow during an incident and it’ll guide you through the questions. It’s like having an experienced on-caller calmly guide you through what to do when a crisis hits. This is complementary to incident.io’s Workflows feature.

4 Challenges Facing CXOs in A World of Digital Everything

As a busy executive, taking time to attend an event and listen to sessions is a luxury. And yet, I know that many of my best breakthrough ideas on how to lead my teams have come from taking those moments to tune into new ideas. The challenge is figuring out where the hidden nuggets of wisdom are buried in a mountain of content.

ITIL, ITSM and incident management. What are they and how do they fit together?

You’ve probably heard the terms ITIL and ITSM, but the distinction between the two can be a little unclear. Throw incident management into the mix, and the whole thing can feel pretty confusing. This article aims to explain what they are, the differences between the three, and importantly how they fit together. First, let’s establish what each of the terms actually mean.

The modern incident management software stack

We’re fortunate enough to speak to a huge number of companies about their incident management processes. In doing so, we’ve noticed an emergent trend in how modern companies are using software to support their incident management processes, and a common set of challenges faced by them too.

SaC - How to build status pages as code with Terraform

Status pages are a clever solution to bundle all your services, and see the status of them at one sight. We at iLert took this one step further: why not build your status page as code using Terraform? We want to show you how we make it possible, and how you can set it up for your own infrastructure - a real SaC solution.

What Metrics and KPIs Really Matter in Availability?

In our inaugural State of Availability Report, we discovered that not only do metrics matter but the way we use them also does. Our research found that teams with fewer KPIs were more likely to meet their Service Level Agreements (SLAs) and provide their customers with higher levels of availability. The problem with having too many KPIs is that they cause information overload and noise.

A Guide to Incident Severity Levels

Maintaining IT infrastructure is a consistent challenge for system administrators, site reliability engineers (SREs), supporting developers, and technicians. Several factors can impact system performance, cause outages, or impact customer experience. On top of that, not all incidents are created equal. The impacts and severity of a system outage affecting 10% of your users are different from an outage impacting 90%.

PagerDuty Named a G2 Leader for Enterprise Incident Management Software

With the announcement of their Fall 22’ Review awards, PagerDuty has been named a G2 Leader for Incident Management Software for the sixth quarter in a row. We owe a special thank you to our customers who have consistently given PagerDuty high satisfaction scores that take into account their likelihood to recommend PagerDuty, our ability to meet their requirements, and the overall ease they’ve found in doing business with us.