Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

AWS re:Invent 2025 - From Alert to Action: AWS + PagerDuty Agentic Ops

Hear how AWS and PagerDuty are transforming incident management with agentic & generative AI. Learn how agents within AWS Quick Suite and PagerDuty work together to detect, diagnose, and resolve incidents with less toil and swivel chair. This session explores how AI collaboration is reshaping resilience across cloud environments.

How agentic IT operations transform IT Service Management (ITSM)

Enterprise ITOps leaders are realizing that legacy incident management processes are collapsing under the weight of today’s sprawling, hybrid-cloud enterprise environments. The fastest path from reactive firefighting to proactive, automated control is an agentic AI-powered incident assistant that can understand context, coordinate people, and take intelligent action at machine speed. Enterprise IT doesn’t look anything like it did even five years ago.

AWS re:Invent 2025 - Smarter Incident Response with Logz.io and PagerDuty

In this session, Jacky Leybman from PagerDuty and David Lotan Bolotnikoff from Logz.io showcase how PagerDuty and Logz.io combine generative AI with rich historical context to automate root cause analysis and accelerate incident response. By correlating real-time telemetry with prior incidents and runbooks, teams reduce manual toil and MTTR while maintaining human-in-the-loop oversight and transparent reasoning.

AWS re:Invent 2025 AI-First Incident Management in Slack

Jacky Leybman from PagerDuty and Kaninie Knight from Slack share how their integration streamlines incident response and real-time collaboration. This session highlights practical workflows and measurable gains – such as faster triage and lower MTTR – achieved by connecting on-call operations directly in Slack.

From Ticket Creation to Human Acknowledgment: Closing the Incident Response Gap

Freshservice has become a trusted system of record for IT teams managing incidents, service requests, and operational issues at scale. Tickets are logged, categorized, prioritized, and tracked with discipline. SLAs are defined. Dashboards provide visibility. On paper, everything looks covered. Yet many teams still experience missed or delayed responses when incidents truly matter, especially after hours. The gap isn’t in ticket creation. It’s in what happens next.

A Recap of 2025

In the past, our yearly recaps were mostly about numbers. What we shipped, how much Spike grew, and a long list of stats. See past recaps: 2023, 2024. But 2025 felt different to me. It had many moments that shaped how Spike as a product and the company looks today. Some of them were exciting. Some were uncomfortable, and all of them changed how I think about building Spike. We’re still bootstrapped and operating lean, with a team of fewer than ten people.

How to Send Critical Freshservice Tickets to On-Call Staff Instantly (OnPage Integration)

This video demonstrates how the OnPage + Freshservice integration helps IT and support teams respond faster to urgent incidents and critical tickets—without changing their existing Freshservice workflows. Freshservice is often the system of record for incidents and service requests, but dashboards and email alerts aren’t always reliable when something requires immediate, human acknowledgment, especially after hours. That’s where OnPage comes in.

OnPage 2025 Product Updates: Clinical Communication, On-Call Management & Incident Alerting

OnPage 2025 Year in Review | Clinical Communication, On-Call & Incident Response ( What’s New in OnPage (2025): CC&C, On-Call Scheduling & Critical Alerts ) In this video, Ritika from OnPage's Product Marketing, walks through the key OnPage product enhancements released in 2025 across clinical communication & collaboration (CC&C), on-call management, and critical incident alerting. The updates shown here are designed to help on-call teams communicate clearly, reduce alert fatigue, and respond faster during high-priority events.