How to Improve Downtime Response: Error Budgeting and Unplanned Downtime
Every one of us reading this blog has seen a fire spring up and quietly walked away from the impending chaos. And everyone one of us has managed to live this long because we understand when to react to a fire. A real fire affects our Service Level Objectives (SLO), and affects the user base. You need to figure out where it is, what started it, and what your team will do about it, and you need to do that now.