Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

Best practices for on-call scheduling and management

An on-call schedule forms the backbone of your incident response system in the event of an outage or when an issue is raised. This type of schedule does not keep end-users waiting and helps maintain the reliability and availability of your software. However, on-call management practices often induce worry and anxiety in team members. In extreme cases, it can even be a contributing factor in employee burnout.

5 tips for a more modern and efficient on-call management

‍ On-call management is one of the most important aspects of seamless IT service. Its aim is to ensure that the right person is notified in the case of an incident, so that they can react accordingly as quickly as possible. In certain cases, many people have to be notified. To achieve this as efficiently as possible, it is vital to have an up-to-date and smoothly functioning system.

ITIL and CI/CD

In the world of IT, there are two main approaches to managing changes—the information technology infrastructure library (ITIL) and continuous integration and continuous delivery/deployment (CI/CD). Both have their own benefits and drawbacks, so it’s important to understand the difference between them before deciding which one is right for your organization. In this article, learn about the difference between CI/CD and ITIL, and find out which approach is best for your needs.

Toil: Still Plaguing Engineering Teams

Our industry has always had localized expressions for work that was necessary but didn’t move the company forward. The SRE movement calls this type of work “toil.” The concept of toil is a unifying force because it provides an impartial framework for identifying — then containing — the work that takes up our time, blocks people from fulfilling their engineering potential, and doesn’t move the company forward.

Cyber, incident, downtime: Three words that chill the board, and how to tame them

There are three words that every member around a boardroom table fears when they hear them strung together: "Cyber... incident... downtime". They are never the precursor to a good meeting! Technology incidents can leave the business in the dark and bring the wheels of industry grinding to a halt. With no operational systems, a Gartner report found that companies can lose up to half a million dollars per hour from severe incidents based on losses and remediation.

How to Help Teams Create Optimal Infrastructure for Availability

Teams are locked into a cycle of suffering characterized by the feeling that they are sprinting just to stay still. This morale and productivity-destroying state is caused by an inability to find time to save time. Our new research, The State of Availability Report 2022, discovered that teams know what they want to do—harness cloud and DevOps practices and tools to advance digital transformation—but something’s getting in the way.

Improving Incident Management with Automation

Incident management is your organization’s first line of defense. When incidents occur, internal teams must be ready to respond quickly. While incidents can happen anytime, it’s unrealistic to expect incident managers to be prepared to perform manual root cause analysis. Manually monitoring and analyzing applications on multiple servers is extremely difficult, which is why human reaction times have traditionally limited the speed of incident management.

What's New: Updates to Incident Response, PagerDuty Process Automation Software & PagerDuty Runbook Automation, Mobile App Experience, and More!

We’re excited to announce a new set of updates and enhancements to the PagerDuty Operations Cloud in addition to the November Product Launch announcements made earlier this month. Recent development and app updates from the product team include Incident Response, PagerDuty® Process Automation, the PagerDuty Mobile App, Integrations, as well as Community & Advocacy Events updates.

7 Incident Management Best Practices to Improve Business Efficiency

Think about the last time your IT systems had an outage: How did your team react to it? Were they organized with a clear idea of how best to resolve the issue? Or was it chaotic, with people firing questions from all directions and customer service channels ablaze with requests for help? Digital technology disruptions are typical (and even expected) at the workplace, but it doesn’t have to be chaotic, with teams rushing around to extinguish the metaphoric fire.