Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Webinar: Making the case for AIOps

Over the past few years, artificial intelligence for IT Operations (AIOps) has risen in popularity within the technology landscape. It’s become a buzzword in the marketing world, and while there are many ways to define AIOps, the best way to start thinking about it is through the lens of outcomes, correlation and strategy—it’s all about the data.

Why you should ditch your overly detailed incident response plan

When critical incidents happen — which they inevitably do 😅 — and you’re in the middle of trying to figure out what the best thing to do is, it can feel comforting to know that you’ve got a pre-prepared list of instructions to follow, commonly known as an “incident response plan”: In theory this sounds quite simple, and a typical flow you might envision is: It might be tempting to think that the hardest part of running incidents is finding or writing a checkl

Announcing Incident watchers: Subscribe to incidents and receive incident updates in real-time

Hey folks, We’re back with another feature update for all our customers! We have recently gone live with the incident watchers feature which nests within an incident details page. This blog will outline how you can access the feature, its primary functionalities and how we foresee it helping improve your incident management process. Note: This feature will be available to pro, premium and enterprise plan users only.

New reports stress the importance of strategic incident management practice

Engineers have been managing incidents for as long as they’ve been building software, but the idea of incident management as a strategic practice in its own right is still finding its place. We’re starting to see big shifts in that area, though — more companies are dedicating headcount, resources, and tools to help them better prepare for, respond to, and learn from their incidents.

How to Put Software Development Security First

What are the keys to building software development security into the early stages of product development? And what are the costs of ignoring security? In this article, xMatters Product Manager Kit Brown-Watts provides his insights on the matter. Every investment decision comes with trade-offs, usually in the form of cost, quality, or speed. The CQS Matrix, as I like to call it, captures the dilemma most product people face.

Beating the odds: How log data helps detect and lower MTTR

Depending on your business, MTTR stands for mean time to repair or mean time to recovery – but it can also mean resolution, resolve, or restore. No matter how you define it, the basic measurement is the same: it’s the time it takes from when something goes down to when it is back and fully functional. This includes everything from finding the problem to fixing it. For ITOps teams, keeping MTTR to an absolute minimum is crucial.

Building great developer experience at a startup

At incident.io, our number one priority in engineering is pace. The faster we can build great product, the more feedback we can get and the more value we can deliver for our customers. But pace is a funny thing. If you optimise for pace over a single month, you’ll quickly find yourself slowed down by the weight of your past mistakes.