Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

New Postmortems Design and Commenting Functionality

One of the most important steps in an incident’s lifecycle is the postmortem. It provides an essential time to reflect on what happened, what could have been done better, and how to build more resilience into a system. But we consistently hear from engineers that incredible toil is typically involved in coordinating stakeholders to write good postmortems.

Okta: Atlassian product suite most popular app of the year

Atlassian and Opsgenie are among the most popular apps in the Okta network this year, according to a new report from the security company. From the report: Okta’s Business @ Work 2020 Report takes an in-depth look at how organizations and people work, exploring industries and customers, and the applications and services they use to harness productivity.

DevOps Incident Management: A Guide With Best Practices

This is the one post I hope you’ll never need. However, should you ever need it, this is your one-stop shop for understanding how to proceed with DevOps incident management. Have you just been attacked? Did the commit go wrong? A CI pipeline went haywire? Don’t worry. I got you.

How to reach 99.99% uptime: High Availability in Practice.

With most businesses finding it hard to achieve a 99.9% uptime throughout the year, achieving a goal of 99.999% uptime looks daunting to developers. Here’s how to reach 99.99% uptime for your business. It’s like asking someone to build a bridge that would never collapse or a machine that would never break down no matter what. In short, it is a hard goal to achieve but yes it is achievable.

Hiteshwar shares his thoughts on being an SRE

Hiteshwar is an SRE based out of Mumbai, India. His area of specialization is in distributed systems. He works on Kubernetes, running his own custom clusters, maintaining them and creating tools to manage and monitor them. He likes to share his learnings by writing articles and blogs on Medium and Linkedin. He is an active speaker in meetups and developer groups and also teaches DevOps and SRE practices at learning centers.

Checklist for publishing a guest post to Fyipe.

Here’s a quick checklist to publish articles or guest posts on Fyipe Blog. We invite anyone to publish stories to any of our publications. If you wish to contribute. Please send an email to [email protected] with your draft article. Please make sure your draft article follows guidelines in this post. Here’s what all this means for you as a writer: Educate your readers and teach them something new. Cut all the fluff. Get to the point — fast. Do not waste their time.

Embracing Chaos With BigPanda's Root Cause Analysis Features

The ever-growing complexity, scale and pace of IT environments puts a huge burden on IT Ops, NOC, and DevOps teams, who are tasked with keeping these environments up and running. One of the biggest challenges is Root Cause Analysis (RCA). When something breaks, they need to determine what broke it, and they need to do it fast.