Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Scaling Up to Keep Costs Down: Automation for Web Application Incident Management

Any organization that’s keeping up with today’s sharp rise in business demands (or better yet, getting ahead of the game) is doing so by getting innovative and jumping at the chance to do things differently. They’re not relying on the old ways or trying to use their existing toolbox. Instead, organizations are looking to the newest technologies and means of adding efficiency to as many day-to-day functions as possible.

incident.io: A scalable incident management solution built for enterprises

For enterprise businesses, a lot is riding on the efficiency of their incident response. These organizations have large customer bases, complex products, and many incidents. They also have loads of incident responders across various roles, making it difficult to coordinate internally.

Unveiling Squadcast's Enhanced Status Pages

Meet Kevin and Mai (again): Navigating the Troublesome Waters of Platform Downtime. Kevin is a Site Reliability Engineer (SRE), constantly on the lookout for potential downtime that could impact their platform, kryptobro.com. Mai is his adept partner, ever-ready to troubleshoot. In their journey, the previous version of Squadcast Status Pages served as a helpful tool, but they soon found room for improvements.

Kubernetes Incident Management Best Practices

Creating just any infrastructure on Kubernetes is not enough. There are so many basic configurations you could apply and create the infrastructure for your application for the time being and it might work just fine. The incident responses won’t always remain 100% reliable. You will run into newer potholes, and that’s okay.

Discover what's driving the recognition behind BigPanda's AIOps innovations

Every day, BigPanda is transforming the way our customers operate. Our advanced AIOps technology redefines incident management, prevents service disruptions, and elevates customer satisfaction – and I couldn’t be more thrilled to see industry experts take notice. I’m particularly proud to see BigPanda mentioned in nine of the highly esteemed 2023 Gartner Hype Cycle reports.

Demo Roundup: PagerDuty Operations Cloud for Kubernetes

In this demo, Corbin Mills shows how to use the PagerDuty Operations Cloud to streamline and automate how a node failure is resolved. You’ll see how he uses event orchestration (in PagerDuty AIOps) to enrich an alert with pod names, and automatically runs a job to check the Kube API status, so that a responder has instant context. AIOps is also grouping and suppressing alerts. Then you’ll see how the responder can run more health status checks without the need to SSH into the environment or interrupt a co-worker for access.