Operations | Monitoring | ITSM | DevOps | Cloud

Keeping it boring: the incident.io technology stack

At incident.io we run a deliberately simple technology stack. Keeping things boring has allowed us to scale from a few hundred customers to several thousand, while having only two platform engineers. In this post I'll walk through the stack, explain some of the choices we've made, and touch on the challenges we're facing as we grow.

Secure access at the speed of incident response

Picture this: it's 2am, your pager goes off, and you're staring at a production database that's on fire. You know exactly what's wrong. You know exactly how to fix it. But you can't touch anything because you're waiting on someone to approve your access request. Meanwhile, your customers are down, your SLAs are bleeding out, and you're refreshing Slack hoping someone in security is awake to click "approve." This is the incident response tax that too many teams pay.

Response Team @ incident.io

When an incident hits, every second counts. The response team at incident.io builds the tools that make sure engineers aren't flying blind when it matters most. Sam, Tech Lead of the response team, takes us inside what it's really like to build the core of incident.io: the high technical bar, the art of prioritisation, and why there's no shortage of meaningful work to do. If you're an engineer who wants to work on something that genuinely makes other engineers' lives better, this one's for you.

AI Engineering at incident.io

Working on AI in incident management means there's no playbook. No million blogs. Just building at the forefront of what's possible with AI models.In this video, Martha, Product Engineer on our AI team, talks about what it's really like working with AI that helps engineers respond to incidents faster. This covers the shift from traditional engineering, learning the personalities of different AI models, and why you need to embrace constant change when new models drop all the time.

The post-mortem problem

Post-mortems are required, time-consuming, and widely disliked — but they’re also one of the biggest opportunities to improve reliability. In this webinar, we talked about how to run post-mortems that actually lead to learning and improvement. This covered why most post-mortems fall flat, how to structure them effectively, and walk through a real example to show what good looks like in practice. The goal: fewer wasted hours, better outcomes, and post-mortems that actually matter.

Everything you need to know about ITIL 5, AI and incident management

ITIL 5 launched in January 2026, and for the first time in the framework's 40-year history, AI governance is front and center. If you're running incident management, on-call rotations, or building operational tooling, this matters: the gap between AI adoption and AI governance is about to become a compliance and operational risk issue. I’m not usually a big ITIL fan, but this guidance has some genuinely useful framing and questions.