Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

New Postmortems Design and Commenting Functionality

One of the most important steps in an incident’s lifecycle is the postmortem. It provides an essential time to reflect on what happened, what could have been done better, and how to build more resilience into a system. But we consistently hear from engineers that incredible toil is typically involved in coordinating stakeholders to write good postmortems.

How Can CIOs Seize the Moments That Matter in a Complex World?

Everybody puts value on work. But not all work is the same or valued in the same way. What if we told you there’s a way to gain/protect up to $1 million in new revenue, reduce unplanned downtime by more than 60%, and improve team productivity by nearly 25%? This is where the differentiation of work comes in. Most of our day-to-day work is planned out; i.e., it’s work with structure.

Okta: Atlassian product suite most popular app of the year

Atlassian and Opsgenie are among the most popular apps in the Okta network this year, according to a new report from the security company. From the report: Okta’s Business @ Work 2020 Report takes an in-depth look at how organizations and people work, exploring industries and customers, and the applications and services they use to harness productivity.

DevOps Incident Management: A Guide With Best Practices

This is the one post I hope you’ll never need. However, should you ever need it, this is your one-stop shop for understanding how to proceed with DevOps incident management. Have you just been attacked? Did the commit go wrong? A CI pipeline went haywire? Don’t worry. I got you.

How to reach 99.99% uptime: High Availability in Practice.

With most businesses finding it hard to achieve a 99.9% uptime throughout the year, achieving a goal of 99.999% uptime looks daunting to developers. Here’s how to reach 99.99% uptime for your business. It’s like asking someone to build a bridge that would never collapse or a machine that would never break down no matter what. In short, it is a hard goal to achieve but yes it is achievable.

Hiteshwar shares his thoughts on being an SRE

Hiteshwar is an SRE based out of Mumbai, India. His area of specialization is in distributed systems. He works on Kubernetes, running his own custom clusters, maintaining them and creating tools to manage and monitor them. He likes to share his learnings by writing articles and blogs on Medium and Linkedin. He is an active speaker in meetups and developer groups and also teaches DevOps and SRE practices at learning centers.