Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

How to choose an incident management software

The ITIL definition of an incident is “an unplanned interruption to or a quality reduction of an IT service”. In your IT ecosystem, an incident may be caused due to a malfunctioning asset, or a network failure. Common incidents include issues with the printer, Wi-Fi connectivity, application locks, email service, laptop, file sharing, unresponsive servers, or even authentication errors.

Best practices for on-call scheduling and management

An on-call schedule forms the backbone of your incident response system in the event of an outage or when an issue is raised. This type of schedule does not keep end-users waiting and helps maintain the reliability and availability of your software. However, on-call management practices often induce worry and anxiety in team members. In extreme cases, it can even be a contributing factor in employee burnout.

5 tips for a more modern and efficient on-call management

‍ On-call management is one of the most important aspects of seamless IT service. Its aim is to ensure that the right person is notified in the case of an incident, so that they can react accordingly as quickly as possible. In certain cases, many people have to be notified. To achieve this as efficiently as possible, it is vital to have an up-to-date and smoothly functioning system.

ITIL and CI/CD

In the world of IT, there are two main approaches to managing changes—the information technology infrastructure library (ITIL) and continuous integration and continuous delivery/deployment (CI/CD). Both have their own benefits and drawbacks, so it’s important to understand the difference between them before deciding which one is right for your organization. In this article, learn about the difference between CI/CD and ITIL, and find out which approach is best for your needs.

Toil: Still Plaguing Engineering Teams

Our industry has always had localized expressions for work that was necessary but didn’t move the company forward. The SRE movement calls this type of work “toil.” The concept of toil is a unifying force because it provides an impartial framework for identifying — then containing — the work that takes up our time, blocks people from fulfilling their engineering potential, and doesn’t move the company forward.

Cyber, incident, downtime: Three words that chill the board, and how to tame them

There are three words that every member around a boardroom table fears when they hear them strung together: "Cyber... incident... downtime". They are never the precursor to a good meeting! Technology incidents can leave the business in the dark and bring the wheels of industry grinding to a halt. With no operational systems, a Gartner report found that companies can lose up to half a million dollars per hour from severe incidents based on losses and remediation.

DERDACK SIGNL4 for Microsoft Sentinel, Defender for Cloud and more

Doreen talks us through the value-add of SIGNL4 for MSPs and enterprise customers of Microsoft Security products and how SIGNL4 facilitates an automated and seamless 24/7 oncall management experience. Derdack SIGNL4 is a member of the Microsoft Intelligent Security Alliance (MISA).

How to Help Teams Create Optimal Infrastructure for Availability

Teams are locked into a cycle of suffering characterized by the feeling that they are sprinting just to stay still. This morale and productivity-destroying state is caused by an inability to find time to save time. Our new research, The State of Availability Report 2022, discovered that teams know what they want to do—harness cloud and DevOps practices and tools to advance digital transformation—but something’s getting in the way.