Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Best Practices to implement in Incident Management

They are like 5 stages of an incident: 1. Assess impact 2. Inform customers (statuspage) 3. Identify the issue 4. Mitigate the issue 5. Resolve the incident Then there’s followup and further work. Also important to note that (2) should be ongoing as you progress. Updating the status page should be done within reasonable periods – e.g. every 15-20 mins unless you specify otherwise.

What can SREs do to make holiday season's peak traffic less chaotic?

Holiday season's peak traffic is the most challenging period for SREs and on-call engineers. In this blog, we have highlighted the things that SREs can do to make the holiday season less chaotic. The recently concluded Black Friday weekend could have potentially been the most challenging shift for on-call engineers working in the Retail or E-Commerce sector. Since such peak-traffic events push the system to the limits, engineering teams are engulfed in a lot of tension preparing for it.

DevOps Workflow | A Complete Guide & Best Practices

Curious about DevOps Workflow? We explain the DevOps process, how automation relates to workflow, and best practices for workflow design DevOps is a methodology that involves Development and Operations working together during the development process. Workflow is the sequence in which tasks occur. DevOps workflow relies heavily on automation and involves: Using DevOps, teams can increase collaboration and improve processes to create more stable and manageable processes.

December 2021 Update - On-duty board, Manual Signls and Azure Sentinel update

Our December update brings a ‘Who is on duty’ board displaying current team members on duty with contact information. In addition, we have simplified the manual sending of Signls and improved the integration with Azure Sentinel. As always, you can find all the details in this article.

Workflows: your process, automated

After many weeks of work, we're delighted to announce the latest feature of the incident.io platform: Workflows. Configure your processes once, and we'll make sure you follow them, every time ✨ A little while ago, I was asked the question: “what makes a good incident response?”. Whilst there’s infinite nuance in the answer, mine was pretty straightforward. The best incidents are founded on principles of communication, coordination, and clear roles and responsibilities.

How to Reduce Noise, Resolve Faster, and Automate More Often with PagerDuty

When we asked how technology leaders are feeling about increased pressure on digital services, they reported that, unsurprisingly, their investments in digital have grown. In fact, 72% are ramping up digital transformation efforts. Yet while the C-suite is interested in AIOps and automation to help their teams, it’s not always clear what their approach should be and how this technology can be applied to solve problems for their teams today.

Observability and SaaS Providers

SaaS is exploding and so it should; it takes commoditized work and infrastructure away from tech teams so that they can focus on differentiating features. But what happens when it goes wrong? How do SaaS platforms make sure they aren't letting their customers down and in turn, letting their customers down? Observability, bolstered with AI gives all the partners the best chance to optimize availability and customer experience. Here's how.