Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Major IT Outage 2021 Recap

We saw that no one is immune from major IT outages in 2021, not even mega titans like Google, Facebook, and Amazon AWS. The following is a recap of some of the major IT outages with widespread impact for 2021. Amazon Web Services’ (AWS) historic outage occurred on December 7, 2021 and lasted roughly 6 and a half hours. The breadth of Amazon and its reach caused not only their warehouse and delivery operations to stop.

Slack outage

Slack, a popular enterprise communications platform, faced a 5-hour system outage yesterday between 9:25 AM – 2:24 PM EST on February 22, 2022. Slack services affected included: messaging, search, link previews, apps/integrations/APIs, posts/files, workspace/org administration, login/SSO, notifications, connections, and calls. AlertOps was NOT affected by this outage.

Cloud Incident Management Guide

It is a well-established fact that companies looking to grow in the digital age can facilitate this mission by adopting the cloud. When pursued with the right intent and implementation strategy, cloud adoption acts as a powerful force multiplier, yielding a cutting-edge IT powerhouse for businesses and helping them grow and innovate at an accelerated pace. Organizations that adopt a cloud-first strategy must safeguard themselves from critical, service-disrupting incidents.

PagerDuty Receives Financial Services Competency From AWS

We are excited to announce that PagerDuty is now an approved AWS Financial Services Competency Partner. We’re looking forward to expanding our global reach and helping financial services organizations accelerate their cloud migration and digital acceleration journeys. This will allow us to further streamline and automate financial service companies’ digital operations while helping them reduce risk and manage compliance requirements.

Episode 3: Mooving to... Stability: The Role of Catastrophic Failure in Software Design

In this episode of Mooving to… Stability: The Role of Catastrophic Failure in Software Design, we had the opportunity to chat with Jeff Atwood, yes that Jeff Atwood of, Coding Horror, Stack Overflow, and Discourse (Chief Happiness Officer). Jeff started writing 911 software in Boulder, Colorado for a small company, which was a crash-course in writing code for software that has real consequences. With this unique and deep perspective, B.J.

Starting projects at incident.io

We’re a small startup (10 people at time of writing) with big ambitions, particularly when it comes to our product. With so many things we want to do, it’s important for us to be structured the way we approach our work, without being so process-driven that we lose all the benefits of being small and nimble. As we’re still new, and the team is growing all the time, very little is set in stone.

Everything you need to know about Squadcast and Microsoft Teams Integration

Microsoft Teams is one of the most versatile tools in terms of providing collaboration and chat solutions to numerous enterprises. We at Squadcast understand how important Microsoft Teams can be for your organization. Hence, we bring you this blog on Squadcast-Microsoft Teams integration that will tell you how this integration can help in improved incident management, effective collaboration and a lot more.