Operations | Monitoring | ITSM | DevOps | Cloud

Observability for GenAI Applications (Grafana OpenTelemetry Community Call)

In this episode, we’re diving into observability for Generative AI apps. AI helps us write code and monitor applications in production - but how do we observe the AI itself? And how do we make sense of complex, non-deterministic AI systems? We’re joined by two great guests: Ishan Jain, working on GenAI observability and Luccas Quadros, working on Grafana Assistant. Together, they bring both platform-level insights and real-world perspectives.

Green dashboards, red flags

A VP of Engineering (from a company I’m not allowed to name) told me recently: "You helped us find and fix real user-facing issues. Now we need to convince our CTO why that matters more than the standard SLO’s and systems." Here's the thing: your CTO is not wrong in measuring the systems and basic uptime. That’s the baseline though. They’re all trying to watch everything, but they’re seeing nothing as it relates to users.

Introducing The First Graylog Helm Chart Beta V1.0.0

Running Graylog on Kubernetes has been possible for a while, but let’s be honest: it usually involved a fair amount of DIY. Custom manifests, duct-taped values files, and more than one late-night kubectl describe pod. That changes today. We’re releasing the first-ever Graylog Helm chart for Kubernetes — now available in beta.

Fleet Management and Terraform: Use cases and best practices for managing collectors in Grafana Cloud

Earlier this year we launched Grafana Cloud Fleet Management to address the pain that comes with managing scores of telemetry collectors across departments and environments. We've been excited to see how organizations are using it to manage collectors at scale, but we've also heard from users who aren't sure how Fleet Management fits with their existing infrastructure-as-code tooling. The good news is Fleet Management is designed specifically to complement—not replace—tools like Terraform.

OpenTelemetry and Grafana Labs: What's new and what's next in 2026

For many teams, 2024 was the year of asking, “can OpenTelemetry do this?” In 2025, the community answered with a resounding “yes,” moving beyond experimentation to focus on what matters most in practice: stability, ease of use, and cross-project compatibility. That momentum now sets the stage for what’s to come for OpenTelemetry in 2026.

Breaking the Iron Triangle: How AI-powered investigations change the economics of uptime

In engineering, there's a concept known as the Iron Triangle. With three sides—cost, quality, time—it's a framework intended to help you prioritize different aspects of project management Want fast, high-quality features? It'll cost you. Need to keep costs down while maintaining quality? That'll take time. And if you're trying to move fast and cheap? Well, good luck with quality. For years, this has been the brutal reality of running services on the web.

Getting the Right Signals: Mobile Observability with Embrace and SquaredUp

More than half of all connections to web services now originate from mobile devices. Mobile apps are no longer peripheral - they are central to how businesses engage customers, deliver services, and generate revenue. Despite this shift, many organizations still rely on observability tools that are fundamentally server-centric. These platforms are adept at monitoring backend health, but they often fail to capture what’s happening at the edge - on the mobile device itself.

A better way to prioritize feature backlogs: the CERB scoring method

When you're on a software team, planning for the weeks and months to come is always a challenge. You have to balance deep feature backlogs, business and leadership requests, customer requests, and operational interruptions. Effective planning requires a way to prioritize the backlog, set realistic roadmap goals, and justify decisions.

New Year, New Telemetry: Resolve to Stop Breaking Dashboards

It's 2026. Your New Year's resolution was to finally migrate to OpenTelemetry. But you're staring at dozens of dashboards that depend on your current data format, and that migration deadline is looming... Sound familiar? If you're an SRE or Platform Engineer facing a top-down OTel mandate, you're not alone. The challenge isn't just about adopting a new standard—it's about doing so without disrupting the observability systems your team depends on every day.