Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Incident Management, On-Call, Incident Response and related technologies.

Keep Your Business Stakeholders Updated While You Save the Day

Imagine this: An airline encounters a major IT incident in a data center that affects their ticketing system. Behind the scenes, technical responders are scrambling to diagnose and fix the issue. However, because today’s systems are so complex, this issue is taking longer than expected to resolve, and hours have passed since the system went down. Meanwhile, passengers are stranded and taking their anger out on customer service agents and sharing their frustrations on social media.

Integrating Opsgenie and Amazon Security Hub

Brief demo of how to integrate Opsgenie with Amazon Security Hub. Use Atlassian Opsgenie Amazon Security Hub Integration to forward Amazon Security Hub findings to Atlassian Opsgenie. Once Amazon Security Hub sends findings to Opsgenie, Atlassian Opsgenie will determine the right people to notify based on on-call schedules and notify them via email, text messages (SMS), phone calls, and iOS & Android push notifications.

Intent-based Capacity Planning and Autoscaling with Kubernetes

Intent-based Capacity Planning is Google's approach to declare reliability intent for a service and then solve for the most efficient resource allocation plan dynamically. Learn how you can start using this approach to effectively manage the reliability of your services running on your Kubernetes cluster.

Reducing MTTR in the Field: 10 Simple Steps Using Retrace

The last decade has ushered in a golden era of software engineering. The rise of cloud computing freed companies from managing their own data centers and provided on-demand scaling. These services allow for provisioning servers on the fly using configuration and code. Treating that task as just another type of software development led to the advent of DevOps.

6 Best Practices For Outstanding Critical Incident Management

"Businesses need to face the inevitability of being hacked at some point. It's not a question of if, but when — and that's why being proactive to minimize the risk is essential." Robert Egan. When a critical incident hits, what happens to an organization without an efficient incident management plan? Essentially, all stakeholders are left "fighting fires," trying to recover their systems, and get their business back up and running.