Operations | Monitoring | ITSM | DevOps | Cloud

3 Signals From KubeCon Atlanta On Where Kubernetes Is Heading Next

KubeCon Atlanta 2025 felt different this year — and CloudZero had a full team on the ground to capture it. Engineers, product leaders, sales reps, and CTO Erik Peterson spent three days embedded across the show floor. Their vantage points were complementary: the outbound conversations, the inbound questions, the demos, the technical deep-dives, and the quieter moments between sessions. Five perspectives stood out.

AI API Aggregation: Managing Costs And Complexity Across Multiple LLMs

Running multiple LLMs without aggregation can feel like managing five different clouds with no dashboard. Sure, you can make it work, but you won’t like the bill. And most SaaS teams didn’t start with a multi-LLM strategy. It just happened. You added one model for reasoning, another for summarization, or maybe a fine-tuned version for customer support. Fast-forward six months, and your AI stack looks like a tangle of APIs. And each charges tokens on its own terms.

Top 9 Web Application Performance Monitoring Tools for 2025

You know that uneasy pause before opening your monitoring dashboard? The one where you're hoping nothing's broken—but a part of you knows something probably is. Performance issues often start quietly: a few slow endpoints, a checkout that takes longer than usual, a graph that looks a little off. Before long, those small signals turn into alerts and support tickets.

Catchpoint Peak Performance Summit 2025: Redefining Observability for the Outcome Economy

We recently hosted our first-ever Peak Performance Summit in Bangalore, India, a one-day event focused on how value-based observability drives digital business outcomes. The summit brought together customers, partners, and technology leaders to share real-world experiences, live demos, and forward-looking ideas. The message running through every session was clear: performance isn’t just about speed. It’s about measurable business results.

Grafana Play updates: A redesigned homepage to celebrate our community

Grafana Play is a free, publicly accessible sandbox environment where anyone can explore and learn about Grafana, no setup or sign-in required. It comes preloaded with sample dashboards demonstrating how to connect to data sources, build visualizations, and experiment with Grafana’s advanced features. Hosted on Grafana Cloud, Grafana Play has grown significantly over the years. With thousands of public dashboards, it’s now a go-to destination for Grafana learning and exploration.

A tale of two incident responses: How our AI assistant found the root cause 3.5x faster

About two months ago, an incident at Grafana Labs was kicked off in typical fashion: A series of alerts were triggered, our on-call engineer acknowledged it on Slack, and the rest of the team quickly began hypothesizing about the potential culprit. But the way the incident was resolved was anything but typical. Yes, our internal team followed best practices to resolve the incident as quickly as possible.

What Is a Data Pipeline

In today’s tech world, IT and security technologies are the functional equivalent of Pokemon. To gain the insights you need, you “gotta catch ‘em all” by ingesting, correlating, and analyzing as much security data as possible. Data pipelines organize chaotic information flows into structured streams, ensuring that data is reliable, processed, and ready for use.

Mezmo's AI-powered Site Reliability Engineering (SRE) agent for Root Cause Analysis (RCA)

We are thrilled to announce the availability of Mezmo’s AI-powered Site Reliability Engineering (SRE) agent for Root Cause Analysis (RCA)—a truly transformative leap forward for engineering and operations teams included in your existing subscription at no additional charge. We are paving the way for a new era of observability, moving beyond passive, reactive monitoring to a world of proactive AI-driven observability.

Automating Chaos Engineering with Terraform

Automating chaos engineering with Terraform eliminates manual setup across environments by enabling you to version control your entire chaos infrastructure, from service discovery to security governance policies. The Harness Terraform provider supports end-to-end automation including Kubernetes infrastructure setup, custom image registries, Git-based ChaosHub management, and granular security controls that ensure safe experiment execution in production.