Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

How We Use PagerDuty for Emergency Response

PagerDuty is known as the platform for driving real-time work, and with the current global spread of COVID-19, many of our customers have been asking how we leverage PagerDuty internally to intelligently coordinate a response to emergency situations (such as this) as they arise. PagerDuty customers primarily leverage our platform for coordinating an incident response process when technical issues happen, such as a bad deployment, network degradation or failed hardware.

Announcing Ticketing

Incidents come up quickly and tracking critical tasks to be done in the moment and after an incident is resolved it can be challenging to keep up with what was done by who during an incident and what tasks still need to be completed. In an effort to continue simplifying your incident response process today we are happy to announce an overhaul of ticketing and task tracking on FireHydrant along with a major overhaul of our JIRA integration.

The Incident Response Approach to Remote Work

In response to recent events, many organizations are implementing social distancing programs such as remote work. Successfully transitioning to remote work does come with challenges, but the right practices and attitudes can make it much less painful (and safer for you than heading into the office). We like to think of incidents as “unplanned investments,” and a sudden switch to remote work could be considered an unplanned investment of its own.

The Ever-Changing IT Industry

Information technology (IT) never slows to a standstill. Technological change disrupts current processes or operations, requiring organizations to make alterations to IT spending. Deviating from legacy technology to 21st century advancements isn’t an option, it’s a requirement! Through automation and powerful integrations, organizations can breathe freely.

Protecting critical business systems and ensuring business continuity in the age of COVID-19

As we are all adjusting to this new reality of living and working in the time of COVID-19, the coronavirus, there is so much that we need to take into consideration. Clearly, the health and safety of our family and colleagues is priority number one – and the local authorities have provided guidance on how to maximize protection.

Lessons in Distributed Communication From Incident Response

As reported cases of novel coronavirus (COVID-19) continue to rise around the world, many companies are increasingly shifting to using remote work as a way of minimizing exposure for their workforce. But even if some of these companies have been remote-friendly in the past, many organizations are currently struggling to figure out how to shift their operations to becoming entirely remote.

Succeeding With Service Level Objectives

In this blog, Danny Mican, a Senior Site Reliability Engineer, outlines how to implement SLOs from scratch using the IIDARR process. He also states it is extremely crucial for your SLOs to be actionable and is always following a feedback approach as it will play an important role in the debate of Features Vs Technical Debt.