Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

How to Take Business Continuity Tests To The Next Level

The importance of effective business continuity planning (BCP) cannot be understated. Being able to avoid and mitigate the risks and damages associated with a disruption to operations is critical to the health of any business. And, the two main pillars upon which a robust BCP program rests are, of course, the plan and the testing program.

Introducing The PagerDuty Postmortem Guide

Your team had been fighting this major incident for hours, but your investigation was hitting one dead end after another. Finally, you managed to isolate the problem and your graphs started to improve. When all systems went back to normal, everyone let out a collective sigh of relief, shut down the response call, and went back to bed, never to think of this incident again. Or so you thought.

Announcing Cloud Data Encryption for Opsgenie

Opsgenie Edge Encryption is a new feature that makes it easy to secure sensitive data and meet compliance requirements while using Opsgenie for alerting and incident management. Edge Encryption secures data before it leaves your environment, you manage the encryption keys, and the experience is seamless for users. Atlassian has no access to the encrypted data and neither do potential attackers.

Introducing the OpsRamp Winter Release, January 2019

OpsRamp helps digital operations teams drive resilient and responsive IT services by discovering topological relationships between resources at multiple levels in the increasingly hybrid and multi-cloud IT stack. In this webinar you’ll get an overview of Winter Release, including demonstrations of features to drive greater efficiency within modern IT operational environments.

Video AMA: Ana Medina

Ana is currently working as a Chaos Engineer at Gremlin 10, helping companies avoid outages by running proactive chaos engineering experiments. She last worked at Uber where she was an engineer on the SRE and Infrastructure teams specifically focusing on chaos engineering and cloud computing. Catch her tweeting at @Ana_M_Medina 11 mostly about traveling, diversity in tech, and mental health.

Automate Tasks with AWS Systems Manager and Opsgenie Actions: A Use Case

Opsgenie Actions enable you to automate manual, repetitive tasks so that your resources are freed up to concentrate on higher-value work. This blog post is the first in a series of use cases in which we discuss how Opsgenie works with various third-party automation platforms to automate these traditionally manual tasks—right from the Opsgenie console or mobile app— to reduce interruptions for your on-call responders, and ultimately help your bottom line.

Improving MSP Incident Alert Management

Improving MSP Incident Alert ManagementAs the big game approaches this Sunday, I’ve been thinking about the NFL’s introduction of instant replay and how it makes the league much more enjoyable! Whether you’re rooting for the Patriots led by Tom Brady … or the Rams, you can’t deny that instant replay makes every Super Bowl much more efficient and adds more clarity to the game.