How available is your website, service, or platform? What must you monitor and measure to ensure availability? How do you translate uptime into availability? This chart has numbers that every Site Reliability Engineer (SRE) should know. Below the chart, you will find answers to commonly asked questions about SRE and associated metrics.
Network traffic analysis is one of the core ways an organization can understand how workloads are performing, optimize network behavior and costs, and conduct troubleshooting—a must when running mission-critical applications in production. VPC Flow Logs is one such enterprise-grade network traffic analysis tool, providing information about TCP and UDP traffic flow to and from VM instances on Google Cloud, including the instances used as Google Kubernetes Engine (GKE) nodes.
In most cases, when users start to access and use a new application or a new release, app performs pretty well. As the user base grows and usage increases, the app can outgrow its infrastructure. Users can start experiencing a dip in performance. Latency increases, bandwidth and memory get exhausted quickly, and some code architectures start to fail because they do not scale well with the increased amount of users.