Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Featured Post

Incidents are lessons, not failures

Delivering digital operations excellence - DevOps, incident management, and keeping organisations running - is a constant challenge. As customer digital expectations rise, so do the complexities of the tech stack and cloud services integrations. But to insist on 100% uptime and rush through incident management without taking learnings into account creates a poor culture that can damage the ability of the DevOps team. This is not how a business creates resilient infrastructure and high-performing teams.

Leveraging AI for Efficient On-call Scheduling

Regardless of industry specifications, creating and maintaining a highly functional incident management process is crucial for organizations of all sizes. The various potential applications of Generative AI in this process can significantly enhance the efficiency, accuracy, and speed of incident detection, analysis, and resolution. GenAI can be utilized across all stages of the incident management process, including preparation, response, communication, and learning.

How our data team handles incidents

Historically, data teams have not been closely involved in the incident management process (at least, not in the traditional “get woken up at 2AM by a SEV0” sense). But with a growing involvement of data (and therefore data teams) in core business processes, decision making, and user-facing products, data-related incidents are increasingly common, and more important than ever.

Rootly On-Call: On-Call Shadowing Feature

Shadowing experienced responders is one of the most effective ways for folks who are new to on-call to gain the confidence and knowledge to handle incidents independently. Traditionally, shadow rotations are cumbersome to set up, involving duplicating and editing an existing schedule. For Rootly On-Call users, setting up shadow rotations couldn’t be easier with our new native Shadowing feature. Here are a few highlights.

[New] Schedule Overrides is now live for every team member!

We are excited to announce a significant enhancement to our scheduling feature based on your valuable feedback! At Zenduty, we understand the importance of flexibility and efficiency in managing on-call schedules and ensuring seamless incident response. Previously, only team managers had the capability to edit schedules and add overrides. This meant that non-manager team members had to reach out to their managers to request override coverage, potentially delaying critical adjustments.

NYSE uses AIOps to identify problems faster and focus on innovation

The New York Stock Exchange relies on AIOps to extract crucial incident insights, allowing IT teams to focus on innovation instead of manually investigating alert data. Chuck Adkins, CIO, shares how an AIOps tool helps the NYSE save time and resolve problems instead of searching through alerts to find them.

Network topology: Definition and role in observability

Network topology describes how a network‘s nodes, connections, and devices physically arrange and interconnect, as well as how they communicate. The arrangement or configuration of a network’s components plays a crucial role in ensuring smooth ITOps with minimum downtime. Any issues in the network can disrupt operations, leading to potentially dire consequences. To prevent this, you need to understand your network functionality and structure.