Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Empower the SREs - Conclusions from The SRE Report 2023

Nov 8, 2022 By Steve McGhee In Catchpoint

Let's be honest, nobody loves surveys. Ok, well I sure don't. But surveys satisfy a huge need in our demand for insights into complex human-computer, sociotechnical systems. It turns out that we've been measuring the computer part pretty well, but the humans – not as easy to keep track of. When Google SRE first defined toil as a metric we wanted to reduce, we spent far too long trying to quantify it numerically based on tooling and insights from computer systems.

Read Post

Catchpoint

Read more about Empower the SREs - Conclusions from The SRE Report 2023

Building an incident management process - incident.fm

Nov 8, 2022 By Incident.io In Incident.io

In this podcast, our panellists discuss the foundations that any team needs to put in place when designing their incident management process. Starting from the basics of defining what we really mean by an incident, to how to set your severity levels, roles and statuses, Chris and Pete share their tips for building solid foundations to run your incidents.

View Video

Incident.io

Incident Management

Read more about Building an incident management process - incident.fm

What is an incident management process?

Nov 7, 2022 By Katie Hewitt In Incident.io

Good incident management is critical to the successful running of any business. Get it wrong, and you risk damaging customer trust, brand reputation and, above all, your bottom line. In this article we’ll give you a 101 on incident management: what is it, why does it matter, and how can you do it well?

Read Post

Incident.io

Read more about What is an incident management process?

For incident management, should you build or buy?

Nov 7, 2022 By Aaron Lober In Blameless

Is your incident response held together by a thread? Are you manually recording incident updates in a shared doc? Do you struggle to juggle the incident management workload with your other responsibilities? Does everyone on-call report data the same way? These are all common problems faced by DevOps teams still relying on homegrown incident management tooling.

Read Post

Blameless

Read more about For incident management, should you build or buy?

[Report:] The true costs of modern IT outages

Nov 7, 2022 By BigPanda In BigPanda

If you’re in IT, no doubt you’ve heard the age-old statistic that an average minute of downtime costs $5,600. It turns out that information is a bit outdated and does not reflect the real and nuanced costs of a modern IT outage. BigPanda suspected this and wanted to uncover the true numbers behind outage costs so ITOps can have a better understanding of costs, causes and “cures” of an IT outage.

Read Post

BigPanda

Read more about [Report:] The true costs of modern IT outages

7 steps to set up an on-call team

Nov 4, 2022 By isDown In isDown

There is a moment in every company when 24x7 support is needed. Congrats! The next step is to start building an on-call team. In this article, we'll go through some of the aspects you should consider. We'll keep it small and, in a future article, go deep into each step.

Read Post

isDown

Read more about 7 steps to set up an on-call team

AppExchange Mavericks: PagerDuty Empowers Customer Service Agents to Resolve Cases Quicker

Nov 4, 2022 By PagerDuty In PagerDuty

Jonathan Rende, SVP of Products at PagerDuty and AppExchange Mavericks, Salesforce MVP Barb Dietz discuss how PagerDuty is working to empower customer service agents to resolve customer-impacting issues faster. BONUS: In this video, you will get a front-row seat to PagerDuty’s product demo. See what's in the video.

View Video

PagerDuty

Read more about AppExchange Mavericks: PagerDuty Empowers Customer Service Agents to Resolve Cases Quicker

PagerDuty November 2022 Product Launch - Product Highlights Demo

Nov 4, 2022 By PagerDuty In PagerDuty

Learn how PagerDuty's latest capabilities can help you solve critical, unplanned work faster in this new product highlights demo. Our host of new capabilities help you improve team productivity, avoid escalations, and optimize digital services. Features highlighted in this video are the following new PagerDuty features and more.

View Video

PagerDuty

Read more about PagerDuty November 2022 Product Launch - Product Highlights Demo

Service Level Management Process Explained (with Examples)

Nov 3, 2022 By Myra Nizami In Blameless

‍ Service Level Management, or SLM, is defined as the process of negotiating Service Level Agreements and ensuring that they are met. ‍ Service Level Management is a fundamental part of SRE and DevOps. It encompasses the expectations and perceptions that both the business and the customer have about the service and its performance. Service level management will include existing and new services as they are added, with the service level agreements (SLAs) being modified accordingly.

Read Post