Operations | Monitoring | ITSM | DevOps | Cloud

This Month in Datadog - December 2025

For our last episode of 2025, we’re focusing on Datadog releases announced at AWS re:Invent. Join Jeremy to see how you can manage logs at petabyte scale in your infrastructure, eliminate unneeded costs in Amazon S3 buckets, build agentic workflows, and detect credential leaks. Later in the episode, Scott spotlights how you can connect your AI agents to Datadog tools and context with our MCP Server.

Highlights from AWS re:Invent 2025: Making sense of applied AI, trust, and going faster

After four days of AWS re:Invent—a 65,000-step marathon that included 60,000 attendees spread across five Las Vegas campuses—and navigating the latest installment of this 13-year-old cloud pilgrimage, we’re all a little dehydrated but significantly wiser. The volume of announcements felt less like a single flood and more like a river branching into three powerful currents. Making sense of this massive technological convergence requires zooming out.

How Datadog Manages 50,000 Apache Iceberg Tables at Scale

Think managing a few database tables is hard? Try 50,000 production Iceberg tables storing petabytes of data with 8 million scans per day. In this clip, Datadog's platform team reveals the architecture choices behind their managed Iceberg implementation that serves hundreds of internal engineering teams.

Datadog at AWS re:Invent, Bits AI SRE, MCP Server, CloudPrem, and more | This Month in Datadog

Get a closer look at features we announced at AWS re:Invent in the latest episode of This Month in Datadog. Tune in for spotlights of Bits AI SRE, now generally available, and Datadog’s MCP Server, which connects AI agents to our platform by ingesting prompts and mapping them to Datadog resources and data. Plus, we cover how to: This Month in Datadog brings you the latest updates on our newest product features, announcements, resources, and events.

Datadog on Apache Iceberg

Historically, Datadog has relied on technologies like Snowflake and Apache Spark on raw parquet files (lacking consistent table structure) to power internal analytics and data science at scale. As usage grew across product teams, more features depended on data science teams, and our datasets grew to include more telemetry data, these systems became complex to manage and govern both technically and financially. The need for a more flexible and scalable solution led Datadog to adopt Apache Iceberg, an open source table format for data lakes that brings reliability and performance while remaining SQL-friendly.

Keep service ownership up to date with Datadog Teams' GitHub integration

Engineering organizations depend on clear team ownership to maintain reliable services and move quickly. But as codebases expand and teams shift, answering basic questions—Who owns this service? Who should be paged in an incident? Are teams meeting operational standards?—becomes harder.

Automate infrastructure operations with Datadog Infrastructure Management

Many organizations struggle to track how their cloud infrastructure changes over time. Modern environments span tens of thousands of resources across hundreds of accounts and multiple clouds. Application teams add new services and regions at a rapid pace, increasing the number and variety of resources that need to be managed. These shifts can cause infrastructure configurations to drift from a well-architected state, increasing the risk of service reliability issues and unexpected cloud spend.

Observability in the AI age: Datadog's approach

Ten years ago, Datadog was a single-product company focused on breaking down the silos between dev and ops. As the shift towards the cloud accelerated and organizations transitioned to the new DevOps model, we set out to develop an observability platform that would enable these teams to safely scale faster and answer the essential questions about their services: are they available, secure, compliant, performant, and cost-efficient?

Optimize Kubernetes cluster cost with Datadog Cluster Autoscaler

Running Kubernetes at scale almost always means paying for more compute than you need. To protect reliability, platform and application teams typically overprovision nodes early in development and keep scaling up as they add features and workloads. They are often reluctant to move to smaller or different instance types without a clear picture of how those changes will affect performance or availability. The result is a fleet of underutilized nodes that silently inflate your cloud bill.