%term

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Make the most from FireHydrant's Service Catalogs with these 4 tips

Jan 31, 2020 By Anna Kelley In FireHydrant

Outages are inevitable. It is how we respond that can make or break our company. In this post, we will talk about how Service Catalogs can impact your incident response process and make it more effective. When a company has just a handful of services, it can be relatively easy to figure out who to call when something breaks. But when companies are at the stage of having dozens of services to manage, figuring out who to page or reach out to can be a challenge.

Read Post

FireHydrant

Read more about Make the most from FireHydrant's Service Catalogs with these 4 tips

Things to do to make on-call less stressful

Jan 30, 2020 By Squadcast In Squadcast

Doing on-call management in a way that’s better, less stressful and actually works to improve your incident response processes, uptime & reliability.

Read Post

Squadcast

Read more about Things to do to make on-call less stressful

FireHydrant Basics

Jan 29, 2020 By FireHydrant In FireHydrant

View Video

FireHydrant

Read more about FireHydrant Basics

Release Notes: Priority-Based Alerting, Support Hours, SMS Alert Sources, Gap Detection in Schedules

Jan 29, 2020 By iLert In iLert

With Priority-Based Alerting, you can set different notification rules for high and low priority incidents.

Read Post

iLert

Read more about Release Notes: Priority-Based Alerting, Support Hours, SMS Alert Sources, Gap Detection in Schedules

How Can CIOs Seize the Moments That Matter in a Complex World?

Jan 29, 2020 By Jerry Weltsch In PagerDuty

Everybody puts value on work. But not all work is the same or valued in the same way. What if we told you there’s a way to gain/protect up to $1 million in new revenue, reduce unplanned downtime by more than 60%, and improve team productivity by nearly 25%? This is where the differentiation of work comes in. Most of our day-to-day work is planned out; i.e., it’s work with structure.

Read Post

PagerDuty

Read more about How Can CIOs Seize the Moments That Matter in a Complex World?

How SIGNL4 supports alert severity

Jan 29, 2020 By Matt In SIGNL4

Event and alert severity are extremly important information for an effective alert management and response. Severity information determine the speed of response, needed resource allocation and the action path taken. Naturally, critical alerts have higher priority than major alerts which again overrule minor alerts.

Read Post

SIGNL4

Read more about How SIGNL4 supports alert severity

Alert Severity in SIGNL4

Jan 28, 2020 By SIGNL4 In SIGNL4

How to use SIGNL4 alert categories to map and display alert severity or criticality

View Video

SIGNL4

Read more about Alert Severity in SIGNL4

DevOps Incident Management: A Guide With Best Practices

Jan 28, 2020 By Guillermo Salazar In XpoLog

This is the one post I hope you’ll never need. However, should you ever need it, this is your one-stop shop for understanding how to proceed with DevOps incident management. Have you just been attacked? Did the commit go wrong? A CI pipeline went haywire? Don’t worry. I got you.

Read Post

XpoLog

Read more about DevOps Incident Management: A Guide With Best Practices

How to reach 99.99% uptime: High Availability in Practice.

Jan 25, 2020 By Nawaz Dhandala In OneUptime

With most businesses finding it hard to achieve a 99.9% uptime throughout the year, achieving a goal of 99.999% uptime looks daunting to developers. Here’s how to reach 99.99% uptime for your business. It’s like asking someone to build a bridge that would never collapse or a machine that would never break down no matter what. In short, it is a hard goal to achieve but yes it is achievable.

Read Post

OneUptime

Read more about How to reach 99.99% uptime: High Availability in Practice.

Hiteshwar shares his thoughts on being an SRE

Jan 24, 2020 By Squadcast In Squadcast

Hiteshwar is an SRE based out of Mumbai, India. His area of specialization is in distributed systems. He works on Kubernetes, running his own custom clusters, maintaining them and creating tools to manage and monitor them. He likes to share his learnings by writing articles and blogs on Medium and Linkedin. He is an active speaker in meetups and developer groups and also teaches DevOps and SRE practices at learning centers.

Read Post