Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

The startup guide to sensible incident management

If you’re working at an early stage startup and looking to get some good incident management foundations in place without investing excessive time and effort, this guide is quite literally for you. There’s an enormous amount of content available for organisations looking to import ‘gold standard’ incident management best practices – things like the PagerDuty Response site, the Atlassian incident management best practices, and the Google SRE book.

No capes: the perils of being a hero-engineer

When I first started out as an engineer I really leant in to the idea of what’s often called “being a hero”; I would get to the office a bit early to make sure I could fix anything that had gone wrong overnight. I loved the camaraderie of someone outside engineering bringing their laptop over with a critical process broken for me to fix (even if I’d been the one to break it!). Being a hero feels really good for a while, but over time, it loses its shine.

A modern data stack for startups

Nowadays, easy access to data is table-stakes for high-performing companies. Easy access doesn't come for free, though: it requires investment and a careful selection of tools. For young companies like us, the question is how much? And when do you make that investment? Having grown to ten people, several without engineering backgrounds but with strong data needs, we decided 2022 was going to be that time.

Using context.Context to mock API clients

We've found a pattern to mock external client libraries while keeping code simple, reducing the number of injection spots and ensuring all the code down a callstack uses the same mock client. Establishing patterns like these is what makes test suites great, and improves developer productivity when writing tests. Here's how it works.

We've successfully completed our SOC 2 audit

We're very pleased to announce that incident.io is now SOC 2 compliant, having successfully completed our Type I audit. Put simply, this means an external auditor has looked at how the company is operating, and how our software is managed and operated, and confirmed that we meet a set of high security standards.

On-call by default

Like many SaaS businesses, we have an on-call rota to enable us to provide 24x7 cover if there are problems with incident.io. We have a 'pager' which will alert the relevant person if something unexpected happens in our app, so that they can investigate and fix it if needed. Note: This was adapted from an internal document we wrote about how we think about on-call at incident.io.

Breaking down complex projects into smaller, shippable increments

Building a complex new product can be scary. What if no-one gets value from it? What if it doesn't work? What if it's hard to change? One way to mitigate these risks is to break down the product into smaller shippable increments, allowing you to capture feedback early and confirming the most important assumptions before fully committing to a solution.

Workflows: your process, automated

After many weeks of work, we're delighted to announce the latest feature of the incident.io platform: Workflows. Configure your processes once, and we'll make sure you follow them, every time ✨ A little while ago, I was asked the question: “what makes a good incident response?”. Whilst there’s infinite nuance in the answer, mine was pretty straightforward. The best incidents are founded on principles of communication, coordination, and clear roles and responsibilities.

Building safe-by-default tools in our Go web application

At incident.io, we're acutely aware that we handle incredibly sensitive data on behalf of our customers. Moving fast and breaking things is all well and good, but keeping our customer data safe isn't something we can compromise on. We run incident.io as a multi-tenant application, which means we have a single database (and a single application).

Deploying to production in <5m with our hosted container builder

Fast build times are great, which is why we aim for less than 5m between merging a PR and getting it into production. Not only is waiting on builds a waste of developer time — and an annoying concentration breaker — the speed at which you can deploy new changes has an impact on your shipping velocity. Put simply, you can ship faster and with more confidence when deploying a follow-up fix is a simple, quick change.