Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Shorten your MTTR with Checkly Traces

We all know that Checkly is a ‘secret weapon’ for engineering teams who want to shorten their mean time to detection (MTTD). With Checkly, you can know within minutes if your service is unavailable for users, or acting unexpectedly. In this article we’ll talk about how Checkly traces can help you expand on the benefits of Checkly, adding insights that will help you diagnose root causes, and further reduce your mean time to resolution (MTTR) for outages and other incidents.

Your New Retrospective Experience: More Collaborative, Customizable, and Powerful

Run smarter, more effective retros. Customize retros, collaborate in real time, and surface key insights faster with AI. The new experience empowers you to spend less time documenting and more time working together as a team to uncover the insights that lead to real improvements in your process, roles, and technology.

Weaving AI into SIGNL4

Over the past two years, artificial intelligence (AI) has experienced remarkable growth, significantly influencing various sectors and daily life. In 2023, the release of advanced large language models (LLMs), such as OpenAI’s GPT-4 and Google DeepMind’s Gemini, marked a pivotal shift by enabling AI systems to process and generate diverse data types, including text, images, and audio.

PagerDuty Operations Cloud Spring 25 Release: Reimagining Operations in the Age of AI and Automation

Operational excellence isn’t just a goal—it’s critical for survival for all companies. And, when powered by AI and automation, it’s a strategic competitive differentiator. With over a decade of AI and ML experience in our platform, PagerDuty pioneered the Incident Response space. And now, PagerDuty is redefining what modern operations can look like in the era of AI and automation.

Microsoft Entra ID Outage: How Vantage DX Detected the Issue Before Microsoft Acknowledges the Issue

On February 25, 2025, at 11:32 AM EST, Martello’s Vantage DX monitoring began alerting on an issue affecting Microsoft Entra ID (Azure AD SSO). While Microsoft had not yet acknowledged the incident, online reddit forums had noted the issue and our Vantage DX proactive monitoring detected disruptions impacting authentication across multiple workloads. See here the critical warning for Exchange in Vantage DX Monitoring. Here is the critical warning for OneDrive and SharePoint in Vantage DX.

Operational excellence in the age of AI and Automation

The future of operations is here with PagerDuty's groundbreaking AI and automation innovations. Learn how PagerDuty AI agents, powered by PagerDuty Advance, and new use cases like security incident management and LLMOps can help your organization achieve operational excellence to reduce cost, mitigate the risk of outages, and accelerate innovation.

Feature Spotlight - Post-Incident Reports

The Post-Incident Report builder is available to Advanced plan customers to help document the incident post-mortem process. This allows users to share key information and understanding about why an incident occurred, how resolvers responded, and what preventive actions can be taken to ensure it doesn't happen again. After creating a Post-Incident Report, you can share it with other colleagues or stakeholders to keep them informed about the steps you’re taking to mitigate and prevent potential recurrences.