Tanzu Tuesdays 62 - Monitoring Avail. w/Error Budget Burn Rate on Tanzu Observability w/Amber Salome
Starting in April of 2020 my team was tasked with managing Tanzu Application Service on multiple foundations for a client. Early on it was a priority to establish a strong SRE practice around managing the platform. This talk discusses how we defined key metrics for monitoring availability, custom solutions for populating availability data into an observability platform (Tanzu Observability by Wavefront), dashboard creating, and alerting practices. We discuss in depth the benefits of using a burn rate when monitoring availability error budget consumption, and how this strategy allows for more sensitive alerting and limiting error budget consumption. This presentation will demonstrate how the cultivation of availability charts and error budget burn rate alerting creates an environment where the data starts working for your team. We emphasize the intentional use of availability error budgeting for backlog prioritization and embracing risk when managing a platform.
Amber Salome is a Senior Solutions Architect for VMware Tanzu Labs Platform Services, starting in 2019 originally at Pivotal. She is passionate about working with customers on Tanzu platform enablement, and has a strong SRE focus in her work. She is based out of Chicago and spends her free time riding her bike by the lake shore and learning photography.