Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

How we leverage our product responder role to push our pace of development

Like many of our own customers, at its heart, incident.io is a software company. Because of this, it means that our work is never truly “done." One of our primary goals is to help people coordinate their response to situations where things haven’t gone well, and make it easy to always do the right thing. But we know that there will always be bugs to fix, features to be introduced and improvements to be made, as evidenced by our changelog.

How Incident Tracking Can Benefit Your IT Organization

In the dynamic world of Information Technology (IT), incident tracking is a critical process within the realm of incident management that can significantly influence an organization’s operational efficiency and service quality. Incident management refers to the identification, recording, and management of incidents—unplanned events or disruptions—that can impact IT services.

How our engineering team uses Polish Parties to maintain quality at pace

It’s fair to say that delivering software faster has never been more relevant. But in doing so, it’s easy to let your bar for quality slip. Often, the guardrail to avoid this is to hire dedicated QA Engineers, whose sole job is to ensure your software works as it should and to spot any issues that arise. Seems sensible, right? Well, at incident.io, we take a different approach.

What Is Site Reliability Engineering? Understanding the complexities of this crucial function

Site reliability engineers manage a lot, and often in incredibly high-stakes environments. Remember that scene from "The Matrix" where Neo dodges bullets in slow motion? Of course you do. As an SRE, it can feel like you're the person getting hit by those bullets, frantically trying to investigate performance issues, automate away toil, and support the engineers around you, all before the next wave of attacks.

Share highly customizable Blameless Retrospectives as ServiceNow Problems

For many organizations, ServiceNow is a crucial platform to run and scale your organization across all departments. Many organizations’ engineering teams have been relying on ServiceNow Incident and Problem Management. Despite that, many have been experiencing a growing volume of incidents hindering their ability to scale not only their incident response but also their retrospective operations, potentially compromising their data governance and compliance requirements.

How we achieved pixel-perfect polish during our Status Pages launch

A few months ago, we released Status Pages. This project was quite different from anything we’ve approached before, given that: And our goals were a departure from one's we had set in the past: With this in mind, we worked closely with our designer throughout the process of building Status Pages. Here is how we approached it and a few lessons we learned along the way!

Catalog vs. Thanos: Who came out on top?

Catalog is really, really powerful. To prove it, our latest product went up against the almighty Thanos and won decisively. Don’t believe us? Just look at how unscathed Catalog was once the dust settled: All jokes aside, we spent months building out what, we think, is one of the most capable products on the market today. Designed to be a map of everything that exists in your organization Catalog can meaningfully help you level up your incident response.