The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.
The average organization can have ten or more monitoring or observability tools in their IT stack. These tools keep generating an overwhelming amount of noise. IT Ops, NOC and DevOps teams drown in this noise and can’t focus on real incidents until it’s too late. Your organization’s alerts don’t have to turn into an untameable tsunami with no end in sight—there’s a better way forward.
Breaking down cloud management platforms and hybrid/multicloud management In our recent Whiskey and Wisdom session, we discussed how ITOps teams are coping with the evolution of cloud management. Whiskey and Wisdom is a monthly executive-only forum where IT operations leaders can network independently and discuss high-level AI operations and ITOps strategies with their industry peers.
Incidents provide an unparalleled opportunity to learn about your people, processes, and products under pressure. In this post, we’ll tell you how to ensure your team isn’t letting these opportunities for learning go to waste.
Budgets in IT departments are tight these days, so proving a return on investment is essential for justifying or expanding a project. The good news is that automation saves money by reducing the amount of human effort required. It is similar to investing in a robot vacuum cleaner. Despite the upfront cost, you save time (and money) by not having humans do the vacuuming. Reporting the value delivered by an automation program can be challenging since the value depends heavily on what is being automated.
We’ve all been in the situation before: it’s Friday at 5 PM and the only on-call engineer available to handle incidents is about to hit the slopes. Unfortunately, at that very moment, a customer reports to support that they are unable to access the company’s ecommerce website to complete a purchase. Internal monitoring systems seem quiet and services appear available on internal health dashboards.
You're probably aware that downtime is expensive—but do you know how expensive it is? The short answer is—very. According to the Ponemon Institute, outages cost organizations an average of $9,000 per minute (or $540,000 per hour). That's why companies of all sizes are investing in incident management tools to reduce their downtime and improve the customer experience.