Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

Enhancing Incident Collaboration: Jira Notes Now Integrated with Squadcast

We're excited to share a significant improvement to our Jira integration aimed at enhancing your incident management workflow. With our latest update, you can now seamlessly sync notes between Jira tickets and Squadcast incidents. This bidirectional sync ensures that any comment added in one platform automatically appears in the other.

What's happening with ITSM in 2024?

The lines between IT service management (ITSM) and AIOps are blurring. The Gartner Hype Cycle for ITSM, 20241 discusses this exciting convergence. Traditionally, ITSM has focused on structured processes and best practices. AIOps brings valuable new capabilities to service management, including automation, correlation, machine learning, and real-time insights. This convergence augments established ITSM frameworks and processes rather than replace them.

BYO Payload: Custom event sources for Signals have landed

Automated event payloads come in many shapes and sizes. These infinitely different event structures pose a problem for users who want to send them all to the same place to page on-call staff. Unless that on-call solution supports the schema directly, you’re out of luck. While we’re proud of the number of integrations we support today for event sources into on-call, we also think the best number that we should support is infinity.

Evaluating PagerDuty Alternatives in 2024 (Updated)

We live in times of instant gratification, where customers expect same-day delivery, round-the-clock tech support, and seamless browsing experiences. Disruptive technologies and continuous innovation have raised expectations for faster and uninterrupted delivery of services. This shift is compelling organizations to adapt their operations to meet these new demands and stay competitive.

Learning from Major Incidents: The Opportunities We're Missing

While they are untimely, stressful and likely to highlight communication breakdowns within an organization; incidents can be a powerful tool for learning and growth in organizations. When an incident occurs with a large impact, which it feels like we read about this happening in the news on a weekly basis, oftentimes the focus is on two things: stabilizing the situation, and controlling the narrative. Organizations often miss the opportunity incidents present: learning.

The Microsoft-CrowdStrike Outage: An In-Depth Analysis

On July 19, 2024, a significant outage impacted globally, causing widespread disruptions across various industries. This outage was primarily linked to a faulty update from CrowdStrike’s Falcon Sensor, which led to severe issues on Windows systems. CrowdStrike is a leading cybersecurity company that specializes in protecting businesses from online threats.

Microsoft 365 Outage, MO821132: Users may be unable to access various Microsoft 365 apps and services

Thursday evening, Microsoft 365 identified a global outage affecting users accessing various Microsoft 365 applications and services. Impacted users suffered from login issues, Azure hosted virtual machines not being available, and constant loading screens in Microsoft 365 services, just to name some of the issues.

Beyond the Headlines: The Unsung Art of Software Outage Management

Today, the entire world is feeling the pain of a major software outage. While we know a lot about these occurrences—our entire business is built on helping companies manage incidents and outages effectively—we’re not here to share our opinion on it. Instead, we’d like to help those unfamiliar with the incident lifecycle understand what happens when an outage like this occurs, who is responsible for what, and what companies ultimately do to get things working again.

Learning Moment: Effective Customer Communication During Incidents - Enhance Visibility & Response with Uptime.com

The recent global outage caused by an operating system update reminded me of how vulnerable we are today and most importantly, how close we are always teetering on global scale incidents with millions of interconnected dependencies. When the base of the house collapses, everything built on top is impacted. Those of us in IT Operations, Monitoring, Observability (insert the current acronym), etc., know firsthand this risk; we face it every day.

A tough day for incident responders: lessons from the CrowdStrike update

Today marks a particularly challenging day for incident responders across the globe. As many of you may have noticed, a recent update from CrowdStrike has triggered widespread disruptions, causing chaos in various sectors. The ripple effects have been far-reaching and severe: While the technical specifics of the issue might not be the focus here—and indeed, there are experts better suited to dissect the cause—what's crucial is understanding the impact on those who manage such crises.