Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Three communications best practices for incident handlers

The importance of well-managed communications when handling IT and security incidents cannot be overstated. If updates are not communicated in a timely and accurate manner, misunderstandings, misalignment, and costly errors will occur. Not to mention, resolution will be prolonged. And if highly sensitive information is communicated to those who should not be privy to such, then the risk of legal ramifications is high, as would be the damage.

ServiceNow + Squadcast Integration: Automate IT Ticketing and Project Tracking

ServiceNow is a workflow automation platform used by organizations for their IT ticketing and project management needs. In contrast, Squadcast is an end-to-end incident management and SRE platform that is used by organizations for their reliability requirements.

What SREs Can Learn from Capt. Sully: When to Follow Playbooks

When are you smarter than your playbooks, and when are your playbooks smarter than you? That’s a question that engineers rarely step back to consider. The rational, disciplined parts of our minds tell us that the playbooks we are supposed to follow were carefully designed and tested, and that we should stick to them at all costs.

Incident Response Lifecycle | A Complete Explanation

Wondering about the incident response lifecycle? We explain what it is, and how each phase helps lead to effective incident resolution. What is the incident response lifecycle? The incident response lifecycle is an organization’s framework for responding to an incident that disrupts service. The incident response lifecycle contains the following phases.

Monthly Moo March 2022

What a start to 2022 has been for us all. We are incredibly proud of the continuous innovation, velocity and delivery of new features and functionality. We’ve heard success story after success story from our brilliant customers, each unique in their own way and continue to collaborate with them on our roadmap. So, this March update is for you and a massive thank you. We couldn’t do it without you, and it’s been our honor to be part of your success.

xMatters Overview - xMatters Demo

Join Stephen Walters, Solutions Architect and DevOps Institute Ambassador, and Daniel Topham, Solutions Architect, as they guide you through a high-level demo of the xMatters solution. See how xMatters sends alerts to the right users at the right time and enriches notifications with relevant data. And, learn how easy it can be to use Flow Designer to integrate different tools and software to create innovative workflows with drag and drop capability.

Amplify Artifactory and Distribution Changes Through PagerDuty

When automated software delivery runs smoothly, it can whisper, and quietly attend to itself. But when your delivery and distribution pipeline runs into a problem, it must shout. Boosting the volume of Artifactory and Distribution change events and issues through PagerDuty can help ensure they’re heard by everyone whose job it is to monitor your software delivery pipeline.

Kubernetes Health Check Using Probes

Kubernetes is an open source container orchestration platform that significantly simplifies an application's creation and management. Distributed systems like Kubernetes can be hard to manage, as they involve many moving parts and all of them must work for the system to function. Even if a small part breaks, it needs to be detected, routed and fixed. These actions also need to be automated. Kubernetes allows us to do that with the help of readiness and liveness probes.