Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Jira Service Management (JSM) Review for On-Call Management (2025)

OpsGenie is shutting down. And Atlassian recommends migrating to Jira Service Management (JSM). But if you’re not sure JSM is the right fit for your team’s on-call management needs, this review will help you decide. I signed up for JSM and put it through real-world testing. I created on-call schedules, rotations, and overrides. Then, I reviewed JSM’s on-call management across 4 key criteria. For each criterion, I shared what I liked and what I didn’t.

How Rootly works with Slack | An end-to-end demo.

Rootly is the AI-native on-call and incident management platform that helps you resolve incidents faster, improve system resilience, and streamline on-call operations. It’s your always-on SRE copilot that automates root cause analysis and identifies patterns that drive continuous improvement—trusted by thousands of companies like LinkedIn, NVIDIA, Replit, Elastic, Canva, Clay, Tripadvisor, and Grammarly.
Sponsored Post

Preparing for cloud failures: Monitoring strategies for distributed hybrid infrastructure

When AWS experienced its recent outage, the ripple effect was immediate. Critical workloads slowed, dashboards went blank, and many teams realized multi-cloud isn't automatically resilient. Cloud-level failures are inevitable due to the interdependent components and complex IT architecture. The recent AWS disruption reminded many teams that the cloud isn't a magic uptime guarantee. Even the most mature providers can-and do-experience large-scale service interruptions.

Event Flows: Deep dive into feature

Managing alert routing in complex environments is hard. When events occur, alerts must reach the right people at the right time, but traditional alert sources struggle with sophisticated, context-aware routing. Event Flows is ilert’s node-based workflow system at the heart of our alerting infrastructure. It enables intelligent event processing, time- and context-based routing, and safe automation, so teams reduce alert fatigue and accelerate incident response. ‍

Service Observability, Service Operations and Service Orchestration: Unifying Visibility and Action Across the Enterprise

For large enterprises, the health and resilience of Business Services define customer experience and business reputation. Yet as technology estates grow in complexity, fragmented toolsets and siloed teams make it difficult to maintain service availability and prevent incidents before they impact the business and ultimately, customers.

When AI Thinks and Humans Act: The Future of Operational Resilience

Artificial Intelligence has become the sharpest tool in the digital arsenal – detecting anomalies, predicting failures, and uncovering risks before they unfold. Yet even the smartest system can’t roll up its sleeves and fix what’s broken. AI can see the problem. But only people can solve it. That’s the critical gap in today’s automation revolution: turning AI’s insight into human action.

Reliability lessons from the 2025 AWS DynamoDB outage

On October 19th and 20th, 2025, the AWS region US-EAST-1 suffered a massive outage. What started with a 3-hour Amazon DynamoDB outage from a DNS issue led to an Amazon EC2 outage that lasted an additional 12 hours before normal service was restored. Over the course of the outage, there were over 17 million outage reports as companies like Snapchat, Roblox, Amazon, Reddit, Venmo, and more were impacted.

Unlock Faster Incident Resolution with PagerDuty + Logz.io

Join us live as we demo how PagerDuty and Logz.io work together to supercharge your Root Cause Analysis. See how real-time observability and enriched incident context can help your team detect, triage, and resolve issues in minutes—not hours. Don’t miss this chance to see the integration in action, ask questions, and learn how to keep your teams in sync while driving continuous improvement. Perfect for anyone looking to level up their incident response!

Top 10 Hospital Messaging Systems (2025): Comparing Communication Tools for Modern Care Teams

Secure and seamless communication is at the heart of effective patient care. Whether coordinating handoffs, requesting consults, activating code teams, or managing after-hours coverage, clinicians rely on messaging systems that are reliable, fast, and built to protect patient data.