Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Make the most from FireHydrant's Service Catalogs with these 4 tips

Outages are inevitable. It is how we respond that can make or break our company. In this post, we will talk about how Service Catalogs can impact your incident response process and make it more effective. When a company has just a handful of services, it can be relatively easy to figure out who to call when something breaks. But when companies are at the stage of having dozens of services to manage, figuring out who to page or reach out to can be a challenge.

How Can CIOs Seize the Moments That Matter in a Complex World?

Everybody puts value on work. But not all work is the same or valued in the same way. What if we told you there’s a way to gain/protect up to $1 million in new revenue, reduce unplanned downtime by more than 60%, and improve team productivity by nearly 25%? This is where the differentiation of work comes in. Most of our day-to-day work is planned out; i.e., it’s work with structure.

DevOps Incident Management: A Guide With Best Practices

This is the one post I hope you’ll never need. However, should you ever need it, this is your one-stop shop for understanding how to proceed with DevOps incident management. Have you just been attacked? Did the commit go wrong? A CI pipeline went haywire? Don’t worry. I got you.

How to reach 99.99% uptime: High Availability in Practice.

With most businesses finding it hard to achieve a 99.9% uptime throughout the year, achieving a goal of 99.999% uptime looks daunting to developers. Here’s how to reach 99.99% uptime for your business. It’s like asking someone to build a bridge that would never collapse or a machine that would never break down no matter what. In short, it is a hard goal to achieve but yes it is achievable.

Hiteshwar shares his thoughts on being an SRE

Hiteshwar is an SRE based out of Mumbai, India. His area of specialization is in distributed systems. He works on Kubernetes, running his own custom clusters, maintaining them and creating tools to manage and monitor them. He likes to share his learnings by writing articles and blogs on Medium and Linkedin. He is an active speaker in meetups and developer groups and also teaches DevOps and SRE practices at learning centers.