Operations | Monitoring | ITSM | DevOps | Cloud

Incident Management

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Use ilert mobile app to take someone else's on-call shift

Use the ilert mobile app to receive push notifications about alerts and gain access to essential incident management features so that you can take immediate action from anywhere. The app also allows you to quickly take over your colleague's on-call shift while on the go. Check out the video to learn more about this feature.

The Show Must Go On - Incidentally Reliable with Piyush Verma (CTO at Last9)

Catch Piyush Verma, Co-Founder and CTO at Last9 in conversation with Ankur Rawal, Co-Founder and CTO at Zenduty — discussing what reliability means to the modern consumer, why SREs make excellent decision-makers, and the current state of observability. Exclusively on The Incidentally Reliable podcast — made by SREs for SREs, hosted by Zenduty. Zenduty is an advanced incident management platform that gives you greater control and automation over the incident management lifecycle.

Resolving a Critical Incident in Core Banking: A Deep Dive into Application Patch Malfunction

In the dynamic environment of core banking systems, maintaining seamless operations is crucial. However, unforeseen complications can arise, leading to critical incidents that demand immediate and effective resolution. A recent incident involving an application patch malfunction presents a compelling study on the intricacies of managing and resolving system anomalies in real-time.

Becoming the Office IT Hero: Put An End To "Are You Down?" Chaos

Downtime is an inevitable reality in the fast-paced world of Information Technology. When systems go offline, the pressure mounts, and colleagues begin to bombard IT professionals with the dreaded question: "Are you down?" The good news is that there's a way to transform this frustrating situation into an opportunity to shine. By implementing a Private Status Page from StatusCast, you can not only proactively communicate issues to affected employees, but also position yourself as the office hero.

Your Practical Guide to Reducing MTTR

Let’s face it. Incidents will always happen. We simply can’t prevent them. But we can strive to mitigate the impact incidents have on our product and customers. Ensuring high reliability depends on quickly and effectively finding and fixing problems. This is where the metric MTTR, standing for “mean time to restore” or “mean time to resolve,” becomes valuable for organizations.

Automating On-Call Scheduling With Squadcast: Simplify Managing Schedules

Navigating an extensive excel sheet to determine On-Call schedules and vacation plans can be daunting. The struggle of maintaining On-Call Schedules manually is real. But we've got a solution that can help. This blog addresses the challenges associated with manualOn Call Scheduling processes.

Understanding IT discovery for ITSM and modern IT stacks

IT discovery is the process of systematically identifying all existing IT components within a tech stack. It involves discovering hardware and software, understanding their configurations, and mapping their interdependencies. Much like your annual doctor visit can proactively identify potential health issues, your IT discovery process can also flag problems and deliver insights to ensure improved operational well-being.