%term

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Simulated Incident Call Recording

May 26, 2022 By PagerDuty In PagerDuty

This is a simulated incident call recording based on a real PagerDuty incident from Jan 2017. The purpose of this simulation is to show how Incident Command System principles are applied to technical product outages.

View Video

PagerDuty

Incident Management

Read more about Simulated Incident Call Recording

Introducing Incident Types

May 26, 2022 By Martha Lambert In Incident.io

We believe incident.io should be used across an organisation, from SRE teams to Customer Success and People Ops. Until now, the way you set up your incident response flows has relied on having one set of roles and fields for every incident, meaning you have to choose between having lots of irrelevant fields to cover every use-case, or not getting the full incident.io experience on some incidents. That’s changing today with incident types, conditional fields and roles!

Read Post

Incident.io

Read more about Introducing Incident Types

We can't all be Shaq: why it's time for the SRE hero to pass the ball and how to get there

May 25, 2022 By Malcolm Preston In FireHydrant

At a going away party from a job I was leaving a few years back, my VP of engineering told a story I didn’t even remember but that I know subconsciously shaped how I viewed my role on that team: Toward the end of my very first day at the company, there was some internal system issue, and with pretty much zero context, I pulled out my laptop, figured out what was going on, and helped fix the issue.

Read Post

FireHydrant

Read more about We can't all be Shaq: why it's time for the SRE hero to pass the ball and how to get there

When incident response requires business response, who should you notify?

May 25, 2022 By Hannah Culver In PagerDuty

From a single on-call engineer hopping online to resolve a problem, to a massive cross-team effort that brings in even the most senior technical leadership (CTO, CISO, or CIO), incident response teams are lucky when they’re able to resolve issues before a customer is aware. But in the cases where there is customer impact, other stakeholders like sales and customer service need to be informed and updated as well.

Read Post

PagerDuty

Read more about When incident response requires business response, who should you notify?

PagerDuty Terraform Time: Write HCL in Go with hclwrite

May 25, 2022 By PagerDuty In PagerDuty

Scott McAllister, Developer Advocate, PagerDuty

View Video

PagerDuty

Read more about PagerDuty Terraform Time: Write HCL in Go with hclwrite

Tracking On-Call Health

May 24, 2022 By Fred Hebert In Honeycomb

If you have an on-call rotation, you want it to be a healthy one. But this is sort of hard to measure because it has very abstract qualities to it. For example, are you feeling burnt out? Does it feel like you’re supported properly? Is there a sense of impending doom? Do you think everything is under control? Is it clashing with your own private life? Do you feel adequately equipped to deal with the challenges you may be asked to meet? Is there enough room given to recover after incidents?

Read Post

Honeycomb

Read more about Tracking On-Call Health

4 Best Practices for Root Cause Analysis

May 24, 2022 By Special contributor In Scout

As failures are a common part of any system’s lifecycle - what would be the Root Cause Analysis for this type of problem? If you build and deploy a system, there are high chances that you'll have to deal with a failure in the near future. However, what matters is how you handle such failures. As an organization, you need to have pre-formulated strategies to handle failures as and when they occur.

Read Post

Scout

Read more about 4 Best Practices for Root Cause Analysis

Introducing Status Pages

May 24, 2022 By iLert In iLert

We are super excited to announce a major milestone in our company history. 10 years ago, iLert started with a simple mission: help companies to increase their uptime and deliver a seamless digital experience. Every feature in iLert is built to help you to respond to critical alerts faster and increase your uptime.

Read Post

iLert

Read more about Introducing Status Pages

List of Potential Incident Management Issues

May 24, 2022 By Roxana González In InvGate

Incident management is the process followed by the area of IT service management to respond to a service disruption, in order to restore it to normal as quickly as possible, minimizing the negative impact on the business. An incident is a single unplanned event that generates a service disruption, whereas a problem is a cause or potential cause of one or more incidents, as defined by ITIL incident management guidelines.

Read Post

InvGate

Read more about List of Potential Incident Management Issues

Major Incident Process Is at the Heart of Effectiveness

May 23, 2022 By xMatters In xMatters

Read the new white paper on major incident management. Businesses need to be prepared for minor and major incidents to happen to their technologies, be it an integration disconnecting or an entire system being taken offline. Preparation ensure that not only can losses be minimized, but they can protect themselves and potentially their clients from risky impacts.

Read Post