Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

AI Impact on software engineering (as I see it)

When I first started using AI (Cursor, to be more specific) for coding, I was very impressed to see how it could generate such high-quality code, and I understand why it's now one of the most widely used tools for software engineers. As I continued to use them more regularly, I realized they are far from perfect. Their effectiveness depends heavily on how they are used and the context in which they are applied.

Verizon outage - January 14

When a major carrier like Verizon goes down, the impact is immediate and widespread. On January 14, 2026, thousands of users across the United States found themselves without cellular service, unable to make calls, send texts, or access data. While social media erupted with reports of “SOS mode” on iPhones, official acknowledgment from the provider lagged behind for hours.

What We Built in 2025, and Why It Matters Going Into 2026

As we move further into 2026, we wanted to pause for a moment and reflect on what the past year looked like for OnPage, not just in terms of features shipped, but in how the platform evolved to better support the way teams actually work in high-stakes environments. 2025 was a foundational year for us.

Why agentic AI is the future of IT change management

Every enterprise depends on continuous changes to its IT environment. New code releases, infrastructure updates, configuration changes, and security patches are all crucial to support continuous innovation. These same changes are also a leading source of operational risk and one of the most common causes of failures at the network, infrastructure, and software layers, resulting in outages.

Getting started with on-call

Setting up on-call is simpler than it seems. It comes down to a few clear decisions about your team and what your service actually needs. This guide walks you through those decisions. You’ll learn who to add in your rotation, how long shifts should last, when to hand off, and what coverage makes sense for your service. By the end, you’ll know exactly how to set up your first schedule and move from ad-hoc firefighting to organized incident response.

How to Monitor SaaS Status in 2026 : A Complete Guide

This is an updated and expanded version of the older guide. According to the 2025 State of SaaS report, organizations use an average of 106 SaaS apps. Staying on top of your SaaS vendors' status is as important as monitoring your own services. The Cloudflare, AWS, Azure, and Google Cloud outages in 2025 were strong reminders of this fact.

Democratizing Reliability: Giving Non-Engineers Real Operational Power with Dileshni Jayasinghe

Many companies don’t invest in incident management until something goes wrong. commonsku took a different path. In this episode of Humans of Reliability, Sylvain sits down with Dileshni Jayasingha, VP of Technology at commonsku, to talk about what it really takes to introduce incident management in a mature, profitable SaaS that had never formalized it. From rolling out observability and incident tooling to practicing internal status updates before going public, Dileshni shares how her team built the right muscles before they were forced to.

Why AI-driven automation in incident response is viable now

This article explains why AI-driven automation in incident response is feasible now. Teams can finally safely delegate repetitive and time-critical response tasks to AI Agents, which operate with contextual awareness and human oversight. The result is faster response, higher service uptime, and less alert noise – without losing control. ‍

PagerDuty Appoints Chris Ferro as Chief Legal Officer

PagerDuty, Inc. announces that Chris Ferro has joined the company as Chief Legal Officer. Ferro will oversee all legal functions at PagerDuty, including corporate, compliance, employment and product matters, with a focus on advancing business objectives while mitigating legal and regulatory risk.