Operations | Monitoring | ITSM | DevOps | Cloud

The Dawn of the 10x Team

Previously, I wrote about how debugging, whether done by humans or AI powered tools, depends on context. Without it, even the most capable systems can only tell you what code is broken, but not why it broke. Now that AI can access the same depth of context developers rely on (stack traces, traces, logs, commits, and code), the way we build and operate software is changing. We’re moving from an era of monitoring to one of reasoning.

Reliability lessons from the 2025 Microsoft Azure Front Door outage

On October 29th, 2025, Azure Front Door suffered an outage that impacted Microsoft services on a global level, including Microsoft 365, Outlook, Xbox Live, Copilot, and more. It also affected Microsoft Azure, meaning companies like Costco, Starbucks, and Alaska Airlines ran into issues for both customer-facing and internal systems. The root of the issue was a misconfiguration in the data plane for Azure Front Door and the Azure Content Delivery Network.

Introducing the New Cloud Dedicated Admin UI

InfluxDB Cloud Dedicated provides hosted and managed InfluxDB Cloud clusters in a single-tenant environment and is optimized to handle high write and query loads. Today, InfluxData is releasing a visual overhaul and new features for its Admin UI. Among the recent updates are live observability for customer clusters, overhauled site navigation, and improved visibility into table schemas.

Manual Call Forwarding vs. Schedule-Based Call Routing: What's the Better Way to Handle On-Call Support?

When your team shares one support number, someone has to decide who gets the calls when customers need help after hours. And if your team rotates on-call responsibilities weekly, which is common in IT (SRE, DevOps, ITOps, etc), clinical and field engineering teams, you’ve probably relied on manual call forwarding at some point. On paper, it seems straightforward: update the forwarding number each week to point to the person who’s on call. In practice? It often turns into a scramble.

Sentry has a bold new look

As you may have noticed, Sentry just got a major glow-up. For too long our product looked like boring enterprise software, while our brand screamed bold and irreverent. No more. From this moment forward our product now matches the vibe you’ve come to expect from us. The result is something that’s more vibrant, more tactile, and more Sentry. Welcome to the S.C.R.A.P.S.

3 Signals From KubeCon Atlanta On Where Kubernetes Is Heading Next

KubeCon Atlanta 2025 felt different this year — and CloudZero had a full team on the ground to capture it. Engineers, product leaders, sales reps, and CTO Erik Peterson spent three days embedded across the show floor. Their vantage points were complementary: the outbound conversations, the inbound questions, the demos, the technical deep-dives, and the quieter moments between sessions. Five perspectives stood out.

AI API Aggregation: Managing Costs And Complexity Across Multiple LLMs

Running multiple LLMs without aggregation can feel like managing five different clouds with no dashboard. Sure, you can make it work, but you won’t like the bill. And most SaaS teams didn’t start with a multi-LLM strategy. It just happened. You added one model for reasoning, another for summarization, or maybe a fine-tuned version for customer support. Fast-forward six months, and your AI stack looks like a tangle of APIs. And each charges tokens on its own terms.

Top 9 Web Application Performance Monitoring Tools for 2025

You know that uneasy pause before opening your monitoring dashboard? The one where you're hoping nothing's broken—but a part of you knows something probably is. Performance issues often start quietly: a few slow endpoints, a checkout that takes longer than usual, a graph that looks a little off. Before long, those small signals turn into alerts and support tickets.

Catchpoint Peak Performance Summit 2025: Redefining Observability for the Outcome Economy

We recently hosted our first-ever Peak Performance Summit in Bangalore, India, a one-day event focused on how value-based observability drives digital business outcomes. The summit brought together customers, partners, and technology leaders to share real-world experiences, live demos, and forward-looking ideas. The message running through every session was clear: performance isn’t just about speed. It’s about measurable business results.

Grafana Play updates: A redesigned homepage to celebrate our community

Grafana Play is a free, publicly accessible sandbox environment where anyone can explore and learn about Grafana, no setup or sign-in required. It comes preloaded with sample dashboards demonstrating how to connect to data sources, build visualizations, and experiment with Grafana’s advanced features. Hosted on Grafana Cloud, Grafana Play has grown significantly over the years. With thousands of public dashboards, it’s now a go-to destination for Grafana learning and exploration.