Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Managing IT Network Disruptions In Your Company Like A Pro

Let's face it, tech meltdowns are the worst. In today's world, a healthy computer network is like the plumbing in your office-you barely notice it when it works, but when it goes kaput, everything grinds to a halt. Emails stop flowing, files disappear, and suddenly, your most productive employees are reduced to staring at useless screens. The good news? There are ways to be a hero and keep your business running smoothly even when the tech gremlins strike. This guide will show you how to be a network-disruption ninja, ready to tackle any tech trouble like a pro.

ilert Call Routing 2.0: Setting Up Your First Call Flow

We're excited to announce a major update to the Call Routing add-on! Our new call flow builder makes it easy to create custom call flows. The intuitive drag-and-drop interface simplifies the configuration process, allowing you to create command sequences and multiple scenarios for different users by adding new branches to your flow. Watch this video to learn how to set up your first sequence of commands.

How Agile Leadership Transforms IT Operations

Traditional IT operations, with their waterfall processes and lengthy release cycles, can feel sluggish in today's business environment. This constant state of "catch-up" can lead to frustration for developers, ops staff, and business leaders alike. Developers struggle to see their innovative ideas come to life quickly. Operations teams scramble to deploy code that feels outdated before it even hits production. Business leaders see their growth potential hampered by slow IT delivery.

AI-Assisted Incident Management Communication

‍ AI has revolutionized various aspects of incident response, from preparation to resolution. Across the incident response lifecycle, AI is being leveraged to streamline processes, reduce noise, and improve overall efficiency. One critical area where AI is making a significant impact is in incident communication. Effective and efficient communication is crucial during incidents, as it ensures that stakeholders are informed and aligned with the incident status and resolution efforts.

Crisis Management for Oil and Gas Companies

Oil and gas companies operate in a high-stakes environment where the potential for catastrophic incidents, such as oil spills, explosions, and natural disasters always exists. These risks necessitate the establishment of robust crisis management for oil and gas companies to ensure the safety of their personnel and minimize potential damage to their operations and organizational reputation.

xMatters Workflow Overview - 2024

Everbridge xMatters automates workflows to eliminate business-impacting digital events, leveraging analytics, automation, and AI to improve response time and resolution. I will be walking through key features in xMatters that will keep your digital businesses running, reducing the frequency, duration, and associated cost of critical service disruptions.

A guide to Grafana OnCall SMS and call routing

Many organizations use incident response setups that enable them to page on-call personnel via calling or sending a message to a phone number. In this guide, you will learn how to configure such a system by using Grafana OnCall. For practical purposes, we’ll pair it with Twilio, though the same basic workflow should be applicable to other platforms. We will start with a basic setup that uses a phone number in Twilio to both call and send SMS messages to a webhook integration in Grafana OnCall.

Pagerly now available on Microsoft Teams - Manage Oncalls, Tickets and Incidents on MS Teams

Manage Oncalls, Incidents on Microsoft Teams (Integrate Pagerduty, Opsgenie) Get Oncall Change Notifications within Microsoft Teams. Mention Current Oncall Automically in any conversation without switching applications.

What is Mean Time to Repair (MTTR)?

Mean time to repair (MTTR) is a metric used to measure the average time required to diagnose and fix a malfunctioning system or component, ensuring it returns to full operational status. In software development, downtime halts user access and disrupts operations, leading to customer dissatisfaction and financial losses. In manufacturing, it slows production, affecting supply chains and profitability. In healthcare, downtime can compromise patient care and safety.