|
By Chris Class
In 2022, we wrote about our engineering interview process to make it more transparent and accessible to candidates. A lot has changed since then: we've grown to 80 people across London, San Francisco, and New York, and naturally, our interview process has evolved too. We thought it was time for an update!
|
By Tom Wentworth
Incidents happen. Whether it’s a service outage, degraded performance, or an unexpected spike in errors, things will go wrong. The question isn’t if incidents will occur—it’s how quickly and effectively you can respond when they do. For years, incident response has been a mostly manual process: someone gets paged, scrambles to investigate, loops in the right people, and after some firefighting, hopefully resolves the issue before too many customers notice.
|
By Chris Evans
Since its launch in 2009, PagerDuty has been the go-to tool for organizations looking for a reliable paging and on-call management system. It’s been the operational backbone for anyone running an ‘always-on’ service, and it’s done the job well. Ask anyone about the product, and you’re all-but-guaranteed to hear the phrase “it’s incredibly reliable.” I agree. But reliability isn’t everything.
|
By Navo Das
It's no secret that building a data-driven culture in a company is hard, but what is it exactly that makes this such a tricky endeavor? Contrary to popular belief, technology isn't the main hurdle. A recent survey reveals that only a quarter of respondents cite technological limitations as the primary obstacle to becoming data-driven.
|
By Stephen Whitworth
I want to walk you through how incident management has evolved, drawing from real data and the experiences of some of the most sophisticated tech organizations out there. I'll also introduce you to a framework we’ve developed at incident.io: the Incident Maturity Model. This framework is the result of thousands of conversations with companies and provides a clear roadmap to help your organization improve its incident management practices—no matter where you're starting from.
|
By Chris Evans
On August 28th, 2023—right in the middle of a UK public holiday—an issue with the UK’s air traffic control systems caused chaos across the country. The culprit? An entirely valid flight plan that hit an edge case in the processing software, partly because it contained a pair of duplicate airport codes.
|
By Lawrence Jones
Picture this: your alerting system needs to tell you it's broken. Sounds like a paradox, right? Yet that’s exactly the situation we face as an incident management company. We believe strongly in using our own products - after all, if we don’t trust ourselves to be there when it matters most, why should the thousands of engineers who rely on us every day? However, this poses an obvious challenge.
|
By Martha Lambert
At incident.io, we run on a monolith. This brings a whole load of benefits that we don’t want to give up any time soon. We don’t have to worry about the speed of internal network requests, complex deployments, or optimizing work that touches multiple services. This blog post isn’t about the relative benefits of monoliths though (but we’ve written more about that here if you are interested)! Ownership in monoliths is tricky.
|
By Lambert Le Manh
As a provider of incident management software, we at incident.io manage sensitive data regarding our customers. This includes Personally Identifiable Information (PII) about their employees, such as emails, first names, and last names, as well as confidential details regarding customer incidents, such as names and summaries. Consequently, we approach the management of this data with a great deal of care.
|
By Jack Colsey
We've written several times about our data stack here incident, but never about our underlying data warehouse and the design principles behind it. This blog post will run through the high-level structure of our data warehouse and then will go in-depth into the underlying layers.
|
By Incident.io
Keeping track of follow-up actions coming from incidents has never been easier. Export to Linear, have it routed to the right team's backlog, and keep the statuses fully in sync, all with a single click.
|
By Incident.io
Join us for a deep dive into how incident.io is leveraging AI to build an intelligent incident investigator. Our guests, Ed and Lawrence, share insights on building AI-powered investigations that help teams to leverage huge amounts of data and signals to respond faster and more effectively.
|
By Incident.io
A full walkthrough of incident.io Response, On-call and Status Pages.
|
By Incident.io
In this episode, we take a look back at 2024 at @incident-io — reflecting on the year’s personal milestones, company-wide changes, and how our product has evolved along the way. Of course, no reflection would be complete without a healthy dose of "banter". Join us as we wrap up the year with insights, laughs, and a lookahead to what's coming early 2025.
|
By Incident.io
This week, we show how you can manage large-scale incidents by breaking the work down into streams with their own Slack channels and calls.
- February 2025 (3)
- January 2025 (3)
- December 2024 (10)
- November 2024 (8)
- October 2024 (6)
- September 2024 (3)
- August 2024 (4)
- July 2024 (12)
- June 2024 (8)
- May 2024 (13)
- April 2024 (18)
- March 2024 (15)
- February 2024 (18)
- January 2024 (9)
- December 2023 (10)
- November 2023 (5)
- October 2023 (10)
- September 2023 (16)
- August 2023 (3)
- July 2023 (8)
- June 2023 (6)
- May 2023 (4)
- April 2023 (8)
- March 2023 (2)
- February 2023 (5)
- January 2023 (5)
- December 2022 (3)
- November 2022 (4)
- October 2022 (10)
- September 2022 (7)
- August 2022 (11)
- July 2022 (6)
- June 2022 (3)
- May 2022 (2)
- April 2022 (3)
- March 2022 (6)
- February 2022 (7)
- January 2022 (2)
- December 2021 (5)
- November 2021 (5)
- October 2021 (2)
Create, manage and resolve incidents directly in Slack. Leave the admin and reporting to us.
Improving your incident response, visibility, and ability to learn:
- Less faffing, more fixing: We take care of the admin during incidents, so you can save your brainpower for the decisions that matter.
- Divide and conquer: We make sure everyone’s role is clear, track who’s working on what, and help you escalate if you need extra help.
- Get up to speed, at speed: Get everyone on the same page from the moment they join the incident, and help stakeholders stay in the loop.
- Timelines, in no time: Constructing an incident timeline for review is important, but time consuming. We’ll build one for you in real-time, and keep it constantly up to date.
- Data and insights you can trust: You’ve already paid for your incidents. By surfacing the data you need to make decisions, we help you get your money’s worth.
Incident response for your whole organisation.