Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Making the Most of PagerDuty + Datadog

For your team to effectively respond to incidents, you need a shared, unambiguous incident definition so you can recognize when an incident has occurred and assign the appropriate severity. Definitions of an incident differ across teams, but whatever definition you use, identifying and monitoring key service level indicators (SLIs) can help you understand when your service is operating normally—and when its performance has degraded to the point where you need to trigger an incident.

A single person on-call "rotation" is a critical vulnerability

One of the most common complaints we hear from operations and site reliability engineers is about the quality of life impacts and the resulting stress imposed by their on-call responsibilities. Most of us are already aware that a proper on-call rotation is critical to our engineering organization’s health in terms of both immediate incident response and long-term sustainable growth.

OnPage Mentioned in Two 2019 Gartner Hype Cycle Reports

Gartner’s Hype Cycle for Business Continuity and IT Performance Analysis are trusted reports, identifying solutions that enhance and solidify an organization’s business continuity. The OnPage team is pleased to announce that we’ve been included in two of Gartner’s Hype Cycle reports, listing OnPage’s incident alert management solution as a trusted tool for today’s support teams.

Vodafone Utilizes PagerDuty to Better Understand Their Real-Time Operations

Vodafone is a telecommunications company providing 4G network coverage for 18 million customers and 99% of the United Kingdom’s population. Ben Connolly, Head of Digital Engineering at Vodafone, details the challenges that his engineering teams were facing and why PagerDuty was the perfect fix. PagerDuty helps Vodafone deliver a better customer experience by allowing their teams to see the impact that they're having in real time.

The results of our 2019 "Future of Monitoring and AIOps" survey are in

IT operations is at a crossroads. The increasing complexity of IT infrastructure and software is challenging IT teams and the business. So this year we decided to focus our survey on what IT Ops execs, managers and practitioners think about the current state of their operations, the future of their systems and the role automation and AIOps might play in their transformation.

How Adopting OnPage can Transform Your Organization

OnPage provides a reliable incident alerting solution, built for today’s healthcare providers and IT professionals, ensuring that important notifications are sent to the right individuals at the right time, every time. Adopting OnPage as a pager service or IT alerting solution equates to HIPAA-compliant exchanges, without human errors or complications.

Summit Day Two: New Integrations and Developer Platform to Bring Real-Time Work to More People

Yesterday, we kicked off PagerDuty Summit by launching new features that support the themes of Visibility and Intelligence. If you missed the keynotes or want to know more, check out this blog post. Today, we are making several announcements around two other themes that our CEO Jennifer Tejada touched on during her keynote yesterday: Platform and People. In fact, these themes are so closely related that we refer to them as one—that PagerDuty is a platform for people to do real-time work.

CIO Dive Playbook: AIOps Brings Calm to Overwhelmed IT Ops Teams

Much has been said about how Artificial Intelligence (AI) is already proving its ability to transform business, as well as the way most people live. In fact, according to Accenture’s “ExplAIned: A Guide for Executives,” AI is on par with such life-changing innovations as electricity and the internal combustion engine, and is no longer science fiction.