The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.
Although every company can benefit from SREs, some need SREs more than others.
They are like 5 stages of an incident: 1. Assess impact 2. Inform customers (statuspage) 3. Identify the issue 4. Mitigate the issue 5. Resolve the incident Then there’s followup and further work. Also important to note that (2) should be ongoing as you progress. Updating the status page should be done within reasonable periods – e.g. every 15-20 mins unless you specify otherwise.
This blog post defines SRE by explaining SLOs and error budgets, highlighting the innovation vs. reliability tradeoff.
Our December update brings a ‘Who is on duty’ board displaying current team members on duty with contact information. In addition, we have simplified the manual sending of Signls and improved the integration with Azure Sentinel. As always, you can find all the details in this article.
After many weeks of work, we're delighted to announce the latest feature of the incident.io platform: Workflows. Configure your processes once, and we'll make sure you follow them, every time ✨ A little while ago, I was asked the question: “what makes a good incident response?”. Whilst there’s infinite nuance in the answer, mine was pretty straightforward. The best incidents are founded on principles of communication, coordination, and clear roles and responsibilities.
When we asked how technology leaders are feeling about increased pressure on digital services, they reported that, unsurprisingly, their investments in digital have grown. In fact, 72% are ramping up digital transformation efforts. Yet while the C-suite is interested in AIOps and automation to help their teams, it’s not always clear what their approach should be and how this technology can be applied to solve problems for their teams today.