Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Sponsored Post

Outages ITOps professionals are thankful to avoid

As we settle into the time of year when we reflect on what we're thankful for, we tend to focus on important basics such as health, family and friends. But on a professional level, IT operations (ITOps) practitioners are thankful to avoid disastrous outages that can cause confusion, frustration, lost revenue and damaged reputations. The very last thing ITOps, network operations center (NOC) or site reliability engineering (SRE) teams want while eating their turkey and enjoying time with family is to get paged about an outage. These can be extremely costly - $12,913 per minute, in fact, and up to $1.5 million per hour for larger organizations.

How to choose an incident management software

The ITIL definition of an incident is “an unplanned interruption to or a quality reduction of an IT service”. In your IT ecosystem, an incident may be caused due to a malfunctioning asset, or a network failure. Common incidents include issues with the printer, Wi-Fi connectivity, application locks, email service, laptop, file sharing, unresponsive servers, or even authentication errors.

Best practices for on-call scheduling and management

An on-call schedule forms the backbone of your incident response system in the event of an outage or when an issue is raised. This type of schedule does not keep end-users waiting and helps maintain the reliability and availability of your software. However, on-call management practices often induce worry and anxiety in team members. In extreme cases, it can even be a contributing factor in employee burnout.

5 tips for a more modern and efficient on-call management

‍ On-call management is one of the most important aspects of seamless IT service. Its aim is to ensure that the right person is notified in the case of an incident, so that they can react accordingly as quickly as possible. In certain cases, many people have to be notified. To achieve this as efficiently as possible, it is vital to have an up-to-date and smoothly functioning system.

ITIL and CI/CD

In the world of IT, there are two main approaches to managing changes—the information technology infrastructure library (ITIL) and continuous integration and continuous delivery/deployment (CI/CD). Both have their own benefits and drawbacks, so it’s important to understand the difference between them before deciding which one is right for your organization. In this article, learn about the difference between CI/CD and ITIL, and find out which approach is best for your needs.

Toil: Still Plaguing Engineering Teams

Our industry has always had localized expressions for work that was necessary but didn’t move the company forward. The SRE movement calls this type of work “toil.” The concept of toil is a unifying force because it provides an impartial framework for identifying — then containing — the work that takes up our time, blocks people from fulfilling their engineering potential, and doesn’t move the company forward.

Cyber, incident, downtime: Three words that chill the board, and how to tame them

There are three words that every member around a boardroom table fears when they hear them strung together: "Cyber... incident... downtime". They are never the precursor to a good meeting! Technology incidents can leave the business in the dark and bring the wheels of industry grinding to a halt. With no operational systems, a Gartner report found that companies can lose up to half a million dollars per hour from severe incidents based on losses and remediation.