Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Putting the "Action" in Actionable Intelligence

AIOps combines machine learning and people to deliver technical outcomes in IT operations. The promise of this capability continues to drive new contenders to the market. AIOps has become a core messaging component for all the major event management players. Many have just rebranded their products to specifically highlight AIOps features. Emerging event management players have arrived and tried to also claim the AIOps space.

Can Endpoint Protection Keep up With Modern Threats?

Endpoint protection is a security approach that focuses on monitoring and securing endpoints, such as desktops, mobile devices, laptops, and tablets. It involves deploying security solutions on endpoints to monitor and protect these devices against cyber threats. The goal is to establish protection regardless of the endpoint’s location, inside or outside the network.

Incident severity and priority 101

Severity and priority can be challenging for a company to nail. When an incident is declared, it's essential to have a system to define the impact and how urgently it should be handled. Incident severity and priority are the two knobs teams can leverage to define scope and urgency, and eventually, the appropriate process to take action. But how should we define them, and what are the differences?

Sponsored Post

What Is a DevOps Toolchain and How Does It Work?

Picture yourself trying to resolve a code error when you notice an additional issue outside your realm of expertise that's making matters worse. Your instinct is to get in touch with the right contact as quickly as possible to resolve the issue so that there's no further impact on the system's uptime. But what if you can't get in touch with them immediately, or don't know who to contact? Instead of trying to solve the problem without support, a DevOps toolchain could have mitigated this chain reaction from the start.

Major IT Outage 2021 Recap

We saw that no one is immune from major IT outages in 2021, not even mega titans like Google, Facebook, and Amazon AWS. The following is a recap of some of the major IT outages with widespread impact for 2021. Amazon Web Services’ (AWS) historic outage occurred on December 7, 2021 and lasted roughly 6 and a half hours. The breadth of Amazon and its reach caused not only their warehouse and delivery operations to stop.

Slack outage

Slack, a popular enterprise communications platform, faced a 5-hour system outage yesterday between 9:25 AM – 2:24 PM EST on February 22, 2022. Slack services affected included: messaging, search, link previews, apps/integrations/APIs, posts/files, workspace/org administration, login/SSO, notifications, connections, and calls. AlertOps was NOT affected by this outage.

Cloud Incident Management Guide

It is a well-established fact that companies looking to grow in the digital age can facilitate this mission by adopting the cloud. When pursued with the right intent and implementation strategy, cloud adoption acts as a powerful force multiplier, yielding a cutting-edge IT powerhouse for businesses and helping them grow and innovate at an accelerated pace. Organizations that adopt a cloud-first strategy must safeguard themselves from critical, service-disrupting incidents.