Every software organization faces challenges in keeping applications available and running reliably. At Google, we’ve developed and practiced a discipline known as Site Reliability Engineering (SRE). Following SRE practices lets us build and operate services reliably for our billions of users. Google has about 2,500 Site Reliability Engineers who support both internal and external services.
At Google, we believe strongly in an open cloud. We’re continually working to bring you tools for understanding how your applications are performing, whether they run in different projects, organizations, clouds, or even on prem. Monitoring tools like Stackdriver Kubernetes Monitoring, OpenCensus, and Stackdriver APM are designed to help you get visibility into your workloads wherever they run—on Google Cloud Platform (GCP), on-premises or on another cloud platform.