Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

What It Means to Be an Incident Commander

Leadership is essential in an organization. Establishing a leadership hierarchy helps teams avoid getting confused about who to turn to with questions and concerns, allowing them to focus their efforts where needed. High-quality leadership is vital to success but becomes even more important when the pressure to resolve an issue with minimal downtime is turned up.

Engineering Manager from a non-STEM background?

There is a long list of requirements a hiring manager looks at before hiring an Engineering Manager, there needs to be a balance between technical and leadership skills to perform well in the position. Engineering Manager roles differ from company to company. It is hard to list what a day in an engineering manager’s life looks like.

Uncovering the mysteries of on-call

For the vast majority of organisations, some form of round-the-clock cover is critical to successful business operations. On-call is an essential part of an effective incident response process, yet there is no commonly accepted playbook on how to most effectively structure and compensate on-callers. We ran a survey to uncover the mysteries of how on-call works in organisations of different shapes and sizes around the world.

What is Live Call Routing?

If there’s one essential thing we’ve learned from being in the business of digital operations for more than 13 years, it’s that every business has a unique approach to building resilience with its bespoke tech stacks and processes. Many PagerDuty customers around the world are starting to provide direct access to their on-call teams with Live Call Routing (LCR).

xMatters Service Intelligence Keeps Your Services Running!

Organizations spend heavily on digital services and business applications, with the expectation they deliver reliable value streams. When an issue occurs, the fear of losing revenue, damaging customer relationships, and upsetting employees can put a tremendous amount of pressure on incident resolvers. With xMatters Service Intelligence, organizations can visualize incidents in real-time, gain greater insight into their root cause, and remediate issues faster with service-centric automation.
Sponsored Post

Best Practices for Communicating with Customers During an Outage

Incidents are unavoidable when running a business. When an incident does inevitably occur, communication is critical while your teams are working to minimize the impact and expedite a solution. For technical resolvers, the first steps during an incident are to look for any leads that point to the source of the issue. Customer service and communications teams, however, must prioritize establishing effective communication with impacted users. Both teams have the right frame of mind, they need to be aligned. This becomes more complicated when such an incident is an outage.

What is an incident, how to handle it, and tips for good incident management

Customer retention is critical. Studies show that acquiring a new customer is five to 25 times more expensive than retaining an existing one. On top of this, a marginal increase in customer retention can yield increases in revenue up to 95%. Customers spend a lot of time interacting with businesses online and their user experience can have a major impact on how they view a company. One bad user experience can send a customer into the arms of a company's competitor.

Lightstep Notebooks helps speed troubleshooting for SREs and developers

Digital business is an imperative for 21st-century companies. Increasingly, organizations are directing investments toward technologies that deliver outcomes fast and enable more resilient digital business models. In this landscape, incidents such as software bugs, power outages, or downed networks have major consequences that affect both revenue and customer loyalty.