Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

How to Evolve Your ITOM Strategy for Modern IT through 2024

Major changes are redefining how IT operations monitoring is done, and impacting tooling, processes and skills. But how exactly can IT Ops leaders ensure continuous service assurance of their critical digital services now and in the future? What’s the key to having the required visibility and control over these modern and complex IT environments that are increasingly hybrid, distributed, dynamic and modular?

Efficient task management for remote/work-from-home teams

As COVID-19 continues to impact communities globally with health care professionals working tirelessly to prepare for emergencies and prevent the further spread of the pandemic, technology companies are also doing their part. Twitter, Google, and Amazon have issued directives instructing employees to work from home as the companies themselves move to pull out of tech events while hosting their own events virtually.

Our Top 5 On-Call Practices

On-call: you may see it as a necessary evil. When responding to incidents quickly can make or break your reputation, designating people across the team to be ready to react at all hours of the day is a necessity, but often creates immense stress while eating into personal lives. It isn’t a surprise that many engineers have horror stories about the difficulty of carrying a pager around the clock. But does on-call have to be so dreadful? We think not.

6 Steps to a More Effective Postmortem

Detailed and specific description of impact? Check. In-depth root cause analysis? Check. Clearly defined and easy to follow resolution? Check. Postmortems present an incredible learning opportunity, despite the inherent cost of time and effort. They ensure an incident is documented, that all contributing factors are understood, and that effective preventative actions have been put in place to reduce the likelihood or impact of recurrence.

Incident management for remote/WFH teams

As the world tries to battle COVID-19, most of our customers here at Zenduty have started implementing social distancing measures within their companies by asking all their employees, including the NOC, SRE, ITOps, Support, and software engineering teams to work remotely or from home. While that may appear to be a drastic change in your day-to-day operations, it need not disrupt your reliability and support operations.

PagerDuty Is for People: Supporting Our Community During COVID-19

Yesterday, we released our earnings during an unprecedented time for society and the market. One of the things I noticed was the collective empathy we experienced as we talked to different teams and companies in preparation, and in our analyst call backs, where to a person, everyone kicked off their call by wishing each other good health and safety. It reminded me that when we are all in this together, not only are great things possible, but it also feels less daunting and more manageable.

How SIGNL4 supports geolocation and GPS information

SIGNL4 provides great support for geolocation information and in multiple ways. When a new alert with geolocation information is displayed in the mobile app, the app renders a map to visualize geographic information of the incident. A double click allows to open the default map application on the mobile device, e.g. to get directions or traffic information.