Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

Extended retention for custom and Prometheus metrics in Cloud Monitoring

Metrics help you understand how your business and applications are performing. Longer metric retention enables quarter-over-quarter or year-over-year analysis and reporting, forecasting seasonal trends, retention for compliance, and much more. We recently announced the general availability (GA) of extended metric retention for custom and Prometheus metrics in Cloud Monitoring, increasing retention from 6 weeks to 24 months. Extended retention for custom and Prometheus metrics is enabled by default.

High-resolution user-defined metrics in Cloud Monitoring

Higher resolution metrics are critical for monitoring dynamically changing environments and rapidly changing application metrics. Examples where high resolution metrics are critical include high volume e-commerce, live streaming, autoscaling bursty workloads on Kubernetes clusters, and more. Higher resolution custom, Prometheus, and agent metrics are now generally available, and can be written at a granularity of 10 seconds. Previously these metric types could only be written once every 60 seconds.

Bucket list: Better log storage and management for Cloud Logging

As more organizations move to the cloud, the volume of machine generated data has grown exponentially and is increasingly important for many teams. Software engineers and SREs rely on logs to develop new applications and troubleshoot existing apps to meet reliability targets. Security operators depend on logs to find and address threats and meet compliance needs. And well structured logs provide invaluable insight that can fuel business growth.

21 new ways we're improving observability with Cloud Ops

We’ve heard from customers about how important it is to be able to reliably operate your applications and infrastructure running on Google Cloud. In particular, observability is critical to reliable operations. To help you quickly gain insight into your Google Cloud environment, we’ve added 21 new features to Cloud Operations, the observability suite we launched earlier this year, which gives you access to all our operations capabilities directly from the Google Cloud Console.

Detecting and responding to Cloud Logging events in real-time

Logging is a critical component of your cloud infrastructure and provides valuable insight into the performance of your systems and applications. On Google Cloud, Cloud Logging is a service that allows you to store, search, monitor, and alert on log data and events from your Google Cloud Platform (GCP) infrastructure services and your applications. You can view and analyze log data in real time via Logs Viewer, command line or Cloud SDK.

Introducing Pub/Sub as a new notification channel in Cloud Monitoring

Around the world, operations teams are working to automate their monitoring and alerting workflows, looking to reduce the time they spend on rote operational work (what we call “toil”), so they can spend more time on valuable work. For instance, Google’s Site Reliability Engineering organization aims to keep toil below 50% of an SRE’s time, freeing them up to work on more impactful engineering projects.

New ways to manage custom Cloud Monitoring dashboards

Earlier this year, we added a Dashboard API to Cloud Monitoring, allowing you to manage custom dashboards and charts programmatically, in addition to managing them with the Google Cloud Console. Since then, you’ve asked us to provide more sample dashboard templates that target specific Google Cloud services. Many of you have also asked us to provide a Terraform module to help you set up an automated deployment process.

Using Recommenders to keep your cloud running optimally

As a cloud project owner, you want your environment to run smoothly and efficiently. At Google Cloud, one of the ways we help you do that is through a family of tools we call Recommenders, which leverage analytics and machine learning to automatically detect issues and present you with optimizations that you can act on.

How to find-and use-your GKE logs with Cloud Logging

Logs are an important part of troubleshooting and it’s critical to have them when you need them. When it comes to logging, Google Kubernetes Engine (GKE) is integrated with Google Cloud’s Logging service. But perhaps you’ve never investigated your GKE logs, or Cloud Logging? Here’s an overview of how logging works in GKE, and how to configure, find, and interact effectively with the GKE logs stored in Cloud Logging.

Tools for debugging apps on Google Kubernetes Engine

Editor’s note: This is a follow up to a recent post on how to use Cloud Logging with containerized applications running in Google Kubernetes Engine. In this post, we’ll focus on how DevOps teams can use Cloud Monitoring and Logging to find issues quickly. Running containerized apps on Google Kubernetes Engine (GKE) is a way for a DevOps team to focus on developing apps, rather than on the operational tasks required to run a secure, scalable and highly available Kubernetes cluster.