Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Great Incident Response Requires 3 Major Components

With remote work becoming more common, and distributed teams the norm, incident response has become even trickier. Years ago, everyone would gather in a war room and sort through the issue together, boots on the ground. Now, things have shifted. Remote work is only projected to increase, and teams need to be able to adapt in order to resolve incidents quickly and efficiently, even if team members are a thousand miles away. But how can we make great incident response a reality?

How to create user groups and route alerts

“Servcies&Systems” category subscriptions provide a highly flexible way of routing alerts to specific user groups. This can for instance be used to route alerts based on responsibilities or skills. But other scenarios are possible too as the category subscription mechanism is extremely powerful. SIGNL4 currently provides two fundamental ways of routing alerts. The first layer is the routing of alerts based on the “on duty” status.

Tips & Tricks for Working Remotely

As COVID-19 (novel coronavirus) cases start to challenge norms around what makes a healthy and safe workplace, more and more companies are leaning in or fully jumping in to embracing remote work. At PagerDuty, over 20% of our workforce is remote—so we are well set up to distribute if the time comes. Beyond the logistical aspects, we also have a strong culture of inclusivity when it comes to remote colleagues.

Unplanned Work Contributing to Increased Anxiety

Unplanned work is on the rise—and most companies are unprepared for it. That’s according to the recent “State of Unplanned Work Report 2020,” which surveyed 1,316 people across North America and the EMEA and APJ regions. The survey focused on identifying current practices and challenges of responding to customer-impacting technology issues.

"TRIBAL KNOWLEDGE" (noun): That thing you should have done, if only someone had told you.

As a former NOC engineer, I clearly remember my onboarding, and especially the deep-rooted fear I felt every time I encountered an alert that was new to me – particularly during a night shift. My only consolation was that I was never alone during training, so there was always someone I could ask that very awkward question: “I’m new here, what do we do with this…?”

How ITIL, DevOps, and SRE Work Together for your Organization

When someone asks what type of “shop” your organization is, can you answer confidently that it’s ITIL, DevOps, or SRE? Maybe some people can, but if you’re a large enterprise, the answer is likely a combination of several of these operating models, especially since SRE has become a key implementation of DevOps. ITIL can work effectively alongside DevOps and SRE principles, though at first glance they appear to be different species.

Grow your blame-free culture with these postmortem best practices

Bugs will happen from time to time. As our systems grow in complexity, new functionalities mean new risks. What makes or breaks a team is not only how it handles incidents, but also how it learns from them. This is where incident postmortems come into the picture.