The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.
At incident.io, we're acutely aware that we handle incredibly sensitive data on behalf of our customers. Moving fast and breaking things is all well and good, but keeping our customer data safe isn't something we can compromise on. We run incident.io as a multi-tenant application, which means we have a single database (and a single application).
In today’s digital economy, seconds matter. For mission-driven organizations, seconds can be a matter of life and death, and service reliability can make or break access to suicide and safety hotlines, disaster relief, time-critical health care, food assistance, and more. That’s where real-time digital operations comes in.
A history of Site Reliability Engineering from its origins at Google in 2003 to the present.
Fast build times are great, which is why we aim for less than 5m between merging a PR and getting it into production. Not only is waiting on builds a waste of developer time — and an annoying concentration breaker — the speed at which you can deploy new changes has an impact on your shipping velocity. Put simply, you can ship faster and with more confidence when deploying a follow-up fix is a simple, quick change.
Complex incidents are both exhausting and commonplace. In this case, incidents that I am referring to as “complex” are incidents that involve multiple, disparate, notifications in your alert management platform. Perhaps these incidents are logically separated because the underlying systems or services were seen as less coupled than they turned out to be in reality.
In this article, we’re exploring how status pages can help you deliver bad news to customers in a “good way,” starting with the psychology of news delivery and how you can use this knowledge for future incidents.