Operations | Monitoring | ITSM | DevOps | Cloud

Kubernetes Monitoring backend 2.2: better cluster observability through new alert and recording rules

We’re excited to announce version 2.2.0 of the backend for our Kubernetes Monitoring solution in Grafana Cloud is now available. The app’s backend is supported by kubernetes-mixin, an open source Prometheus Monitoring Mixin, and this latest version features significant improvements to alert rules and recording rules that will enhance your cluster observability and monitoring experience. There’s a lot to tell you about, so let’s dive in.

A look back at DASH 2025

DASH 2025 brought the Datadog community together like never before. During our biggest event yet, thousands of attendees gathered at the North Javits Center in New York City for two and a half days of content, learning, and community, where they deepened their knowledge and connected with peers. Here's a quick look back at some of the highlights from this year's DASH.

Proactively troubleshoot with synthetic testing and distributed tracing

As your application grows in complexity, identifying the root cause of issues becomes increasingly difficult. Many monitoring strategies make this even harder by siloing frontend and backend data. To effectively troubleshoot problems that spread across your app, you need visibility not just into each part of your stack, but also into how these parts interact.

Monitor agents built on Amazon Bedrock with Datadog LLM Observability

As large language models (LLMs) grow more powerful, organizations are deploying agentic AI applications to tackle complex, multi-step tasks. With Amazon Bedrock Agents, developers can orchestrate these agents to manage tasks such as triggering serverless functions, calling APIs, accessing knowledge bases, and maintaining contextual conversations—all while breaking down complex user requests or tasks into manageable steps.

Smarter Workflows, Faster Insights: How InfluxDB 3 Unlocks the Power of Python at the Source

Businesses across industries rely on time-stamped data to track system health, monitor performance, and improve operations. Whether it’s sensors on a factory floor or usage logs from a SaaS platform, time series data reveals how things change. As businesses digitize operations and add connected devices, sensors produce growing streams of time-based data. This opens the door to faster analytics and smarter automation. But legacy approaches can’t keep up.

The Rise of Tech Events in India: A New Era for Cloud-Native Computing

As India emerges as a significant player in the global public cloud landscape, with its public cloud services market projected to reach $25.5 billion by 2028 at a CAGR of 24.3% for 2023-28, the country is witnessing a surge in tech events. This growth is mirrored in the live events market, which is experiencing a 15% YoY growth, fostering a stronger community and facilitating the exchange of ideas and innovation in the public cloud sector.

FinOps For AI: How Crawl, Walk, Run Works For Managing AI Costs

“It started as an experiment.” That’s how it begins at most companies. A small team spins up a few GPU instances to train a proof-of-concept model. Maybe it’s a fraud detection algorithm. Maybe it’s GenAI for support tickets. Either way, it’s just a test. Then the results come in, and they’re promising. Suddenly, that model is powering new features. Teams are fine-tuning LLMs in parallel.

Cloudflare's Resolver Outage: More Than Just DNS

“It’s always DNS.” That’s the running joke in IT. When websites won’t load and apps grind to a halt, DNS—the internet’s address book—is often the first to get blamed. That’s because DNS translates human-friendly names like google.com into IP addresses that computers use to route traffic.

Atatus APM: Full-Stack Visibility for Modern Engineering Teams 2025

APM stands for Application Performance Monitoring or Application Performance Management. It helps engineering teams track key metrics, detect slowdowns, and improve the overall performance of their applications. With Atatus APM, you get complete visibility into your application, from backend code and databases to external services and frontend performance.

How to Strengthen Your Security Operations with Incident Response Software

When our organization – a mid-sized, fast-scaling technology company specializing in enterprise service management solutions, serving clients in regulated industries like finance and healthcare – faced its first serious cybersecurity breach in early 2024, we realized our incident response management approach wasn’t just outdated – it was putting the business at risk. Back then, we had alerts. We had logs.