Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Microsoft 365 Outage, MO821132: Users may be unable to access various Microsoft 365 apps and services

Thursday evening, Microsoft 365 identified a global outage affecting users accessing various Microsoft 365 applications and services. Impacted users suffered from login issues, Azure hosted virtual machines not being available, and constant loading screens in Microsoft 365 services, just to name some of the issues.

Beyond the Headlines: The Unsung Art of Software Outage Management

Today, the entire world is feeling the pain of a major software outage. While we know a lot about these occurrences—our entire business is built on helping companies manage incidents and outages effectively—we’re not here to share our opinion on it. Instead, we’d like to help those unfamiliar with the incident lifecycle understand what happens when an outage like this occurs, who is responsible for what, and what companies ultimately do to get things working again.

Learning Moment: Effective Customer Communication During Incidents - Enhance Visibility & Response with Uptime.com

The recent global outage caused by an operating system update reminded me of how vulnerable we are today and most importantly, how close we are always teetering on global scale incidents with millions of interconnected dependencies. When the base of the house collapses, everything built on top is impacted. Those of us in IT Operations, Monitoring, Observability (insert the current acronym), etc., know firsthand this risk; we face it every day.

A tough day for incident responders: lessons from the CrowdStrike update

Today marks a particularly challenging day for incident responders across the globe. As many of you may have noticed, a recent update from CrowdStrike has triggered widespread disruptions, causing chaos in various sectors. The ripple effects have been far-reaching and severe: While the technical specifics of the issue might not be the focus here—and indeed, there are experts better suited to dissect the cause—what's crucial is understanding the impact on those who manage such crises.

OpenTelemetry, AI, and the Future of Observability with Andreas Grabner

Shubham Srivastava from our team had the pleasure of meeting Andreas Grabner at KubeCon + CloudNativeCon Europe earlier this year. Andreas wears many hats in his daily work, primarily serving as a DevOps Activist at Dynatrace, where he has dedicated over 16 years to shape the Observability solutions we see today. He is also a Developer Advocate at Keptn – helping teams automate and orchestrate their deployments end-to-end and plays an active role as an Ambassador in the CNCF community.

Nexthink Stops MS Outage From Hurting a Leading Consumer Goods Company

While individual blue screen errors are frustrating, the recent global system crashes caused by a CrowdStrike update incompatible with Microsoft Windows have wreaked havoc across entire industries since early Friday morning. Companies ranging from the airlines, media, and banking industries have been facing significant disruptions, with thousands of customer-facing devices experiencing blue screens and causing widespread travel delays and chaos.

UptimeRobot Alerts Spike 5x Due to Microsoft/CrowdStrike Global Issues

Given recent global events, UptimeRobot is experiencing an increased number of downtime notifications. We are currently sending out five times more notifications than usual due to a widespread power outage impacting several critical services worldwide. Here’s a brief overview of the situation and how it affects our monitoring services.

The IT Scramble is On with a Microsoft Outage: Incident MO821132 - July 18, 2024

On July 18, 2024 at 6:38 pm ET, Vantage DX, Martello’s Microsoft 365 and Teams performance management solution, started to see indicators of a likely Microsoft outage impacting users’ ability to access various Microsoft 365 apps and services. Almost an hour later at 7:41 pm ET Microsoft issued a statement on X.