Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

Slack outage

Slack, a popular enterprise communications platform, faced a 5-hour system outage yesterday between 9:25 AM – 2:24 PM EST on February 22, 2022. Slack services affected included: messaging, search, link previews, apps/integrations/APIs, posts/files, workspace/org administration, login/SSO, notifications, connections, and calls. AlertOps was NOT affected by this outage.

Cloud Incident Management Guide

It is a well-established fact that companies looking to grow in the digital age can facilitate this mission by adopting the cloud. When pursued with the right intent and implementation strategy, cloud adoption acts as a powerful force multiplier, yielding a cutting-edge IT powerhouse for businesses and helping them grow and innovate at an accelerated pace. Organizations that adopt a cloud-first strategy must safeguard themselves from critical, service-disrupting incidents.

PagerDuty Receives Financial Services Competency From AWS

We are excited to announce that PagerDuty is now an approved AWS Financial Services Competency Partner. We’re looking forward to expanding our global reach and helping financial services organizations accelerate their cloud migration and digital acceleration journeys. This will allow us to further streamline and automate financial service companies’ digital operations while helping them reduce risk and manage compliance requirements.

Episode 3: Mooving to... Stability: The Role of Catastrophic Failure in Software Design

In this episode of Mooving to… Stability: The Role of Catastrophic Failure in Software Design, we had the opportunity to chat with Jeff Atwood, yes that Jeff Atwood of, Coding Horror, Stack Overflow, and Discourse (Chief Happiness Officer). Jeff started writing 911 software in Boulder, Colorado for a small company, which was a crash-course in writing code for software that has real consequences. With this unique and deep perspective, B.J.

Starting projects at incident.io

We’re a small startup (10 people at time of writing) with big ambitions, particularly when it comes to our product. With so many things we want to do, it’s important for us to be structured the way we approach our work, without being so process-driven that we lose all the benefits of being small and nimble. As we’re still new, and the team is growing all the time, very little is set in stone.

Everything you need to know about Squadcast and Microsoft Teams Integration

Microsoft Teams is one of the most versatile tools in terms of providing collaboration and chat solutions to numerous enterprises. We at Squadcast understand how important Microsoft Teams can be for your organization. Hence, we bring you this blog on Squadcast-Microsoft Teams integration that will tell you how this integration can help in improved incident management, effective collaboration and a lot more.

Sprint planning - How to prioritize urgent production issues?

Small engineering team members wear a lot of hats while working on a product. It becomes hard to prioritize and deal with issues that arise during production when a sprint is already planned and put in place. This not only makes sprints harder to plan but also reduces accountability. How do you tackle this problem and make sure your engineering team does not burn out at the same time? Let’s list down a couple of characteristics of this engineering team that is quite common across the board.

Designing your incident severity levels

We wrote this article in response to a question asked in our Slack Community. Click here to join hundreds of technology leaders discussing best practices for incident response! ✨ We know a thing or two about incident response. As such, we're often asked to advise when companies are designing their incident response processes. A common question is "How do you design your incident severity levels?". It's a great question given how central they are to incident response!