Operations | Monitoring | ITSM | DevOps | Cloud

Latest Videos

Self Service Monitoring at Planet Scale: Waze Case Study (Cloud Next '18)

You’ve built a successful app that serves millions of users - great! Now how do you manage your 100’s of microservices that are running in multiple clouds, by various different teams across the org? In this session, we'll share the Waze team’s stories as they’ve transitioned to zero config, self service monitoring for their dev teams.

Release with Confidence: Testing, Debugging, and Monitoring in a Serverless World (Cloud Next '18)

Identifying the cause of a bug in a serverless system can sometimes be difficult. We'll show you how to tame your bugs with testing, and how to diagnose and mitigate problems in production.

Centralized Logging Solution for Google Cloud Platform (Cloud Next '18)

In this session, we’ll give practical guidance on consolidating and managing your logs, share tips on both what to log and what not to log, discuss logging agents and their potential pitfalls, and show you how to extract value from your log entries for reporting and alerting on logs.

Visualizing Network Topologies and Traffic (Cloud Next '18)

In this session, we will look at which use cases in the field of network monitoring and management are relevant in a cloud environment and which data Google Cloud Platform provides to gain insights. We will then demo how to visualize traffic flows and topologies using a mix of Google and Open Source tools.

Optimizing and Troubleshooting Your Application, the Google Way (Cloud Next '18)

In this session, you’ll learn about the value of these kinds of tools, how you can automatically extract telemetry from your app with OpenCensus, and will receive a demonstration of how to solve customer issues in a multi-cloud deployment with Stackdriver APM and other tools supported by OpenCensus.

Improving Reliability with Error Budgets, Metrics, and Tracing in Stackdriver (Cloud Next '18)

Members of the Stackdriver and Customer Reliability Engineering teams will demonstrate how Stackdriver tooling inspired by the needs of SREs at Google brings you the ability to run services more reliability and with fewer false positive signals through tracking and alerting upon error budgets and debugging with the exemplar technique during an outage.