Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Why Small Business IT Disasters Are Almost Always Preventable

A server goes down on a Tuesday morning. A ransomware file starts encrypting documents at 2 a.m. A key employee clicks a link in what looked like a vendor invoice, and by the time anyone notices, credentials have been sitting in the wrong hands for six hours.

Tap-to-call | OnPage New Feature Release

Introducing Tap-to-Phone Call in OnPage. When critical incidents require more than messaging, teams need a fast way to connect. With Tap-to-Phone Call, users can place a direct phone call to group members directly from within an OnPage conversation. By simply tapping the phone icon, responders can transition from secure messaging to live voice coordination through their mobile carrier network, helping teams communicate faster when every second counts.

Round-Robin Alert Distribution in OnPage | Incident Management Application

Introducing Round-Robin Alert Distribution in OnPage. When every alert starts with the same responder, critical issues can pile up fast and put too much pressure on the same on-call team members. With Round-Robin Alert Distribution, OnPage can route alerts sequentially across responders, helping teams distribute urgent work more evenly, reduce workload concentration and support a more balanced on-call experience.

MTTR - Mean Time to Repair: Definition and the Hidden Costs of Downtime

When a critical system goes down, the clock starts ticking. Every minute matters. Whether it’s a cloud platform, manufacturing operation, logistics center, airport infrastructure, or business-critical software, downtime creates more than just technical issues — it often leads to significant financial losses. That’s where MTTR comes in. MTTR measures how long it takes an organization, on average, to restore normal operations after an incident.

Incident Prevention & Incident Assistant Demo - The best incident is one that never happens

The best incident is one that never happens. The BigPanda team recorded a live demo of the AI Incident Prevention & AI Incident Assistant as part of ITSM Week, hosted by the Service Desk Institute. ITSM teams are measured by how effectively they prevent disruption. Yet many teams still spend too much time reacting to noisy, low-context incidents after impact has already begun. Watch this on-demand session to learn how leading organizations are moving beyond manual firefighting to autonomous operations with Agentic AI.

11 Incident Management Best Practices Every IT Team Should Follow

A well-defined incident management process can mean the difference between a minor disruption and a major business outage. When critical services fail, every minute of downtime matters. Yet many IT teams still face challenges such as unclear ownership, poor prioritization, communication gaps, alert fatigue, and manual processes that delay resolution. The result is longer outages, missed SLAs, and frustrated users.

Shopify outage affects stores, admin panels, and APIs on June 3, 2026

On June 3, 2026, Shopify experienced a widespread service disruption that affected merchants and customers across multiple regions. Users reported storefront failures, admin dashboard issues, API connectivity problems, and authentication errors that disrupted ecommerce operations for several hours. While the outage did not affect every Shopify customer, reports quickly began arriving from around the world, indicating a significant platform issue.

Behind the Scenes: Shift-Based Schedules

The PagerDuty team lifts the hood on the newly rolled out Shift-Based Schedules. This session breaks down how PagerDuty is moving away from layer-based architecture to a flexible system that natively scales with modern engineering teams and naturally fits their workflows. Timestamps: Speakers: Ken Choate (Software Engineer) Kelsey Yocum (Sr. Product Designer) MJ (Sr. Engineering Manager) Todd Murphy (Principal Product Manager)

Top IT Ticketing & SOAR Tools for Automated Workflows

For IT and SecOps teams, the challenge is not a lack of alerts. It is the sheer volume of noise coming from monitoring tools, security systems, and support channels. Trying to manage this volume manually is not just slow; it’s a recipe for mistakes, team burnout, and critical system failures.

Pager Replacement: Modern Alternatives to Physical Pagers

While physical pagers were once the undisputed gold standard for urgent communication, their technological limitations now create dangerous bottlenecks for modern healthcare and IT teams. Carrying multiple devices is not only inconvenient but increasingly inefficient, prompting a widespread shift away from legacy hardware. As of May 2026, the obsolescence of traditional pagers is undeniable.