Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

How AIOps turns anomaly detection into faster incident resolution

Quickly finding and resolving monitoring anomalies can make all the difference between service issues – and service excellence. But it’s far from easy, whether you’re trying to sift through countless alerts, understand the context behind anomalies, or swiftly pinpoint their root causes. If you’re an ITOps practitioner or enterprise architect looking to fine-tune your anomaly detection and resolution skills, you’ve come to the right place.

How Squadcast Helps With Flapping Alerts

Often we receive a series of alerts that get auto-resolved within a short period of time. Such alerts are called flapping or transient alerts. In this blog, we'll explore Auto Pause transient alert (APTA) feature that detects flapping alerts and temporarily pause incident notifications hence reducing alert fatigue.

Top 5 AIOps predictions for 2024

AI exploded onto the global main stage in 2023, and it could seem hard to read an announcement or article that didn’t mention AI once, if not a dozen times. Amidst all this hype, BigPanda CEO Assaf Resnick identified a real tipping point for AI adoption: lowered skepticism. “Over the last two or three years, AI has come into the public domain,” he explained.

Discover the Sweet Spot : Offering Five Levels of Component Depth.

Indulge in our video "Have Your Cake and Eat it Too: Offering Five Levels of Component Depth." Explore how StatusCast delivers a delectable experience by providing five levels of component depth, allowing you to have complete control over your monitoring and incident management. Discover the sweet spot where efficiency meets customization and learn how StatusCast is revolutionizing the way you handle incidents. Watch now and savor the taste of seamless component management!

Did you know anyone can be affected by IT Downtime?

Discover the hidden risks of IT downtime that affect everyone! Whether you're a tech enthusiast, business owner, or just curious about the digital world, this video is a must-watch. IT downtime is more than just a technical glitch – it's a phenomenon that can impact individuals and businesses alike.

Simplifying Service Dependency With Squadcast's Service Graph

Microservices are fantastic for agility and innovation, but the trade-off is complex service management and ownership. With hundreds of interconnected services, troubleshooting and Incident Response can become a potential blocker. The traditional siloed approach to service ownership and the increasing deployment makes service management more complex.

Safeguarding Operations: A Comprehensive Guide to Disaster Recovery and Business Continuity for Data Center Managers

In the dynamic world of data center operations, preparedness is key. This blog serves as a comprehensive guide for data center operations managers, exploring the critical aspects of disaster recovery (DR) and business continuity (BC) planning. Learn how to fortify your data center against unforeseen events and ensure seamless operations even in the face of adversity.

The Debrief: Building AI-Related Incidents

Recently we went live with one of our biggest product launches to date AI. And this product was unique in that it was broken up into four smaller projects: So naturally most folks might be wondering: What were the biggest differences between these projects and what went into actually building out each of these features? In this episode, you'll hear from Rob and Isaac, both Product Engineers who played a really critical role in the building out of related incidents, to get a peek behind the curtain.