You have identified a data breach, now what? Your Incident Response Playbook is up to date. You have drilled for this, you know who the key players on your team are and you have their home phone numbers, mobile phone numbers, and email addresses, so you get to work. It is seven o’clock in the evening so you are sure everyone is available and ready to respond, you begin typing “that” email and making phone calls, one at a time.
Kubernetes makes it easier in certain ways to manage reliability. But incident response teams and SREs must also be prepared to handle the unique reliability challenges that K8s creates.
Automated incident management ensures that critical events are detected, addressed and resolved in a fast, efficient manner. Automation allows incident management tools to integrate with each other and fosters instant communication across the systems. Automation tears down barriers across IT operations (ITOps) teams and ensures all departments are on the same page. Teams gain full visibility into incident status to verify that incidents are addressed by the relevant groups.