Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

From plan to practice to prevail: my conversation with Chris Johnson, host of the MSSP 1337 podcast

In cybersecurity, prevention often gets most of the attention. But no matter how strong your defenses are, incidents will happen. And how you respond in that moment of truth defines resilience. That’s why I really connected with a framework Chris Johnson shared with me on the MSSP 1337 podcast, the 3 P’s – plan, practice, prevail.

PagerDuty Joins Glean's AI Ecosystem: Unlocking More Seamless Incident Management

Today, we announced that PagerDuty is now officially part of the Glean MCP Directory! This partnership brings together two leaders in AI-powered productivity and operations, making it easier than ever for organizations to connect PagerDuty’s incident data directly to any AI tool or agent in their stack through the standardized Model Context Protocol (MCP). PagerDuty is the first (and currently only) incident management partner that is available via Glean’s AI ecosystem.

Introducing the BigPanda observability and monitoring tool rationalization framework

When enterprises run dozens of monitoring and observability tools, performance gaps almost always emerge. By applying the BigPanda Observability Scorecard, our customers consistently see their tool portfolio fall into three groups: In some cases, removing bottom-tier tools can reduce portfolio complexity by double digits while cutting operational noise by as much as 35-40%. This simplification reduces costs while creating a leaner, more reliable monitoring environment that strengthens service availability and operational efficiency.

How to analyze observability and monitoring tools for actionability

Choosing the right observability tools is critical so ensure your teams get actionable insights. In this video, we explore how to evaluate observability platforms based on their ability to detect anomalies, link causes, and trigger effective responses.

Physician On Call Schedule: How to Create an Effective, Fair & Reliable Call System

Providing continuous, high-quality care takes more than clinical expertise—it depends on well-designed physician on call schedules that balance patient safety, physician wellness, and operational efficiency. Whether you manage a residency program or a multi-specialty group, creating an effective physician call schedule—or a broader provider on call schedule—is critical for 24/7 coverage and clinician well-being.

You don't need a real outage to find your weak spots.

Modern digital services rely on complex systems, and chaos can strike at any layer. But the most effective teams don’t wait for failure to learn. They simulate it. By introducing controlled performance degradations, you can stress your systems, test your dependencies, and uncover hidden risks without touching production. In our latest webinar, Catchpoint experts walk through how teams are building resilience through proactive, safe failure testing, and why it’s become a cornerstone of digital reliability.

Goodbye Email-to-Text: Why Modern Mobile Alerting with SIGNL4 Is the Smarter Choice

Over the past year, major U.S. mobile carriers have shut down their free email-to-SMS and email-to-text services – once common ways to send a text message directly from an email account. AT&T terminated its SMS gateway service in mid-2025, Verizon discontinued its SMS gateway domain in late 2024, and T-Mobile retired its gateway domain in December 2024.

Agentic AI Becomes Essential: Why Adoption Is Accelerating and What Comes Next

The cautious optimism business leaders held towards AI agents has evolved into more widespread enthusiasm. In our last survey from April 2025, just over half (51%) of companies had deployed AI agents in their organization. Six months later, 75% of companies are deploying more than one agent, according to PagerDuty’s latest research.

Automate or Elevate? 5 Steps to Build an AI-Powered Incident Playbook

Modern development tools, CI/CD infrastructure, and AI have accelerated the pace at which companies release software. This speed supports innovation, but it also increases complexity and the chance of something breaking in ways that aren’t immediately obvious. Teams now deal with more operational data, complex failure patterns, and systems where a small configuration change can ripple across dozens of microservices.