Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Measure Customer Value with Self-Service Observability

DevOps practices, and the teams that implement them, are becoming increasingly critical to the value which any company provides its customers. This was the key message throughout a recent fireside chat between DevOps Institute Chief Ambassador Helen Beal and Moogsoft VP of Product and Design Adam Frank. A great paradox of the digital era is that, once written, software is invisible to those who write it.

Automation: The Key to Modern IT

Automation is everywhere in our day-to-day IT practices. Many of the processes that have been created for managing hardware and software components were designed, or at least initiated, in a time when managing only a few instances of an application was the norm. When we look at the work required to create, deploy, and maintain applications at a modern scale, the shortcomings of these processes become apparent.

What is IT Operations Management (and should you prioritize it)?

IT operations management (ITOM) involves the administration of technology applications and components across an enterprise. To effectively manage your IT operations, you must prioritize capacity management, security, availability, and cost-control of all IT infrastructure and assets. Yet, doing so can put a strain on your enterprise. At AlertOps, we offer a major incident management and response platform designed to help your enterprise manage its IT operations.

Is your online gaming platform "Chaos Monkey"-proof?

Try to imagine a bunch of monkeys running around your data center, pulling cables, trashing routers and wreaking havoc on your applications and infrastructure. Ever more crucial in these days of heated competition between online gaming operators, is player experience. Continuity of operations is “Uber-Alles” and avoiding churn, due to service disruption, is the organizational mantra.

Zen Your Life With IT Event Noise Reduction

IT incident responders have been inundated with alerts since the start of the COVID-19 pandemic. These engineers must dig through their messages to collect and respond to real alerts for real critical events. This process wastes time and prolongs incident response. The objective is to focus on IT event noise reduction to recognize and resolve real incidents promptly.

Incident Management in Mattermost: Creating an Incident Playbook

The idea behind Incident Management is to be ready. Not ready for anything, as that can be an unrealistic expectation, but ready to respond when the unexpected inevitably happens. DevOps teams often create incident playbooks in order to ensure they are as ready as possible to handle situations as they arise. Luckily, there is some amazing documentation on how to do just that from our friends at PagerDuty.