Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

The Incident You Never Had: Deterministic Simulations w/ Will Wilson (Antithesis CEO)

Most reliability engineering happens after something breaks. Will Wilson thinks that's the wrong place to be. As co-founder and CEO of Antithesis, the autonomous testing platform that just raised $105M in a Series A led by Jane Street, Will has spent years building the infrastructure to catch failure modes before they ever reach production. His starting point is uncomfortable: the testing practices most teams rely on are structurally incapable of finding the bugs that cause real incidents.

Incident Response Reimagined: Accelerating Resolution with AI Agents

Learn how PagerDuty is leveraging Agentic AI to transform the incident lifecycle from reactive firefighting to proactive prevention. Manuel Reis, Software Developer at PagerDuty, demonstrates how new tools like the SRE Agent and Scribe Agent assist engineers during high-pressure outages by autonomously triaging alerts, querying logs in tools like Grafana, and transcribing context directly into incident channels.

8 Video Workflows That Optimize IT Operations

It wasn't that long ago when Agile revolutionized IT workflow, introducing a feedback-forward process that ensured each project task was perfected and approved before moving on to the next. To execute a task with high precision, an assigned team needs a reliable arsenal of tools, including video. Project managers also need updated tool stacks to lead complex projects to completion.

The Hidden Cost of AI Productivity: When Efficiency Turns Into "Brain Fry"

A new HBR study reveals that the race to build and manage AI agents may be pushing knowledge workers toward a new form of cognitive overload. If you spend any time on LinkedIn these days, you’ve probably seen the same type of post over and over. Someone proudly announces they built an AI agent that now writes their emails, analyzes data, drafts presentations, and maybe even ships code.

The Path to Autonomous Operations: PagerDuty Spring 26 Release

Shipping velocity has never been faster, but reliability can’t be the trade-off either. For engineering leaders, deploying AI for operations is no longer optional. The question is whether you’ll lead the transformation or fall behind. The hard truth? Organizations can’t keep relying on humans as the first line of defense. Not when the pace of shipping has never been faster. It’s simply not scalable.

On-call compensation for IT engineers in 2026

Imagine it’s 2 AM and a critical system flatlines without warning. A bleary-eyed on-call engineer scrambles to restore service, shielding customers from a major outage that could torpedo your next Service Level Objective (SLO) review. Yet when daylight returns, debates over fair on-call compensation start all over again: What’s “just” pay for sleepless nights, unpredictable pings, and rapid-fire incident responses?

Do Veterinarians Go Oncall? And How Does It Work?

Veterinary clinics typically operate during standard 9–5 business hours. But emergencies don’t follow a schedule. Having the option to reach an on-call veterinarian through a dedicated after-hours emergency line provides peace of mind not only for pet owners, but, believe it or not, for veterinarians as well. So how does ONCALL work for veterinary clinics? Find out more through our Doggy Explain video.#dog.

Turning team knowledge into Alert Routing rules

Over time, on-call teams build up a quiet layer of knowledge about their systems. Someone learns that a specific error code always means phone calls are failing. Someone else figures out that a particular background job fires a warning every night and has never once needed attention. That knowledge shapes how your team responds to incidents every day. But when it only lives in people’s heads, your response depends entirely on the right person being available at the right time.