%term

Day 2 with Cilium: Small configurations that keep large clusters boring

Dec 18, 2025 By Candace Shamieh In Datadog

Operating Cilium at a small scale is straightforward. You install the Helm chart, choose a routing mode, and apply a few network policies. Day 1 is about getting packets to flow. Day 2 is about keeping them boring. At Datadog, we run Cilium across hundreds of Kubernetes clusters, tens of thousands of nodes, and hundreds of thousands of pods in multiple clouds. When operating at this scale, small configuration choices stop being minor details and start becoming risk multipliers.

Read Post

Datadog

Read more about Day 2 with Cilium: Small configurations that keep large clusters boring

Python memory profiling: Common pitfalls and how to avoid them

Dec 18, 2025 By Bowen Chen In Datadog

Continuous profiling has established itself as core observability practice, so much so that we’ve referred to it as the fourth pillar of observability. But despite the capabilities and growing adoption of continuous profiling, it can still be confusing to approach profiling as a newcomer and correctly apply it to different troubleshooting scenarios.

Read Post

Datadog

Read more about Python memory profiling: Common pitfalls and how to avoid them

Centrally set up and scale monitoring of your infrastructure and apps with Datadog Fleet Automation

Dec 18, 2025 By Ethan Debnath In Datadog

Setting up and scaling observability across large, distributed environments often requires platform and SRE teams to coordinate access to infrastructure hosts and switch between configuration management tools and product-specific documentation. These tasks increase setup time and create delays in establishing visibility of critical services in Datadog. As teams expand their infrastructure, they need to coordinate Datadog configuration changes in a consistent and auditable way.

Read Post

Datadog

Read more about Centrally set up and scale monitoring of your infrastructure and apps with Datadog Fleet Automation

Datadog re:Invent recap 2025

Dec 17, 2025 By Datadog In Datadog

View Video

Datadog

Read more about Datadog re:Invent recap 2025

The Hidden Costs and Concerns of Iceberg Maintenance

Dec 17, 2025 By Datadog In Datadog

Everyone talks about how great Apache Iceberg is, but nobody warns you about this: without proper maintenance, your tables will bloat, queries will slow down, and your catalog will run out of memory. Here are the 4 critical operations you MUST run regularly. Expiring snapshots prevents metadata bloat (Datadog learned this the hard way with catalog memory pressure). Deleting orphan files cleans up failed writes. Compacting data files keeps streaming workloads fast. Compacting manifests optimizes query planning.

View Video

Datadog

Read more about The Hidden Costs and Concerns of Iceberg Maintenance

Improve log utilization with Datadog log exclusion filters | Datadog Tips & Tricks

Dec 17, 2025 By Datadog In Datadog

Want to make your logs easier to work with? Excluding unneeded logs from indexing reduces noise and may reduce log management costs. In this video, you’ll learn how to: See for yourself how to improve log utilization with Datadog Log Patterns and log exclusion filters. Then set up an alert to track ingestion spikes.

View Video

Datadog

Read more about Improve log utilization with Datadog log exclusion filters | Datadog Tips & Tricks

Training Foundation Models on a Trillion Data Points with Apache Iceberg

Dec 16, 2025 By Datadog In Datadog

Training an AI foundation model on over a trillion data points sounds impossible without hitting your production systems. Here's how Datadog did it with Apache Iceberg for their time series forecasting model TOTO. The key challenge: extracting massive historical observability data (metrics spanning years) and running incremental preprocessing pipelines without overwhelming production services. Iceberg solved this by providing schema governance, consistency guarantees, and seamless integration with ML tools like Ray and PyTorch.

View Video

Datadog

Read more about Training Foundation Models on a Trillion Data Points with Apache Iceberg

Monitor your Kubernetes operators to keep applications running smoothly

Dec 15, 2025 By David Lentz In Datadog

The performance of your Kubernetes operators often influences the behavior of the applications they manage. Operators automate the day-to-day management of your applications by executing critical activities, which may include scaling replicas, performing upgrades, and recovering from failures. For example, a PostgreSQL operator can ensure that standby servers are always deployed, that the database’s failover is correctly configured, and that data is backed up on schedule.

Read Post

Datadog

Read more about Monitor your Kubernetes operators to keep applications running smoothly

From performance to impact: Bridging frontend teams through shared context

Dec 15, 2025 By Addie Beach In Datadog

Connecting day-to-day development work to real user outcomes can be challenging. As a result, engineers and product teams often struggle to effectively prioritize projects together. While the goal of improving user experience (UX) is the same, each team relies heavily on different—and often siloed—forms of monitoring to understand their app, creating a disconnect in metrics and visualizations that can be hard to communicate.

Read Post

Datadog

Read more about From performance to impact: Bridging frontend teams through shared context

How to Track Cloud Costs in Real-Time Instead of Waiting Days

Dec 15, 2025 By Datadog In Datadog

Tired of waiting days to see your AWS bill spike? Datadog solved this problem using Apache Iceberg to deliver real-time cloud cost visibility - updating every 15 minutes instead of waiting for billing data. Here's how it works: They sync real-time resource inventory (EC2 instances, Kubernetes pods) into Iceberg tables, then use Trino to join those snapshots with unit pricing data. The result? FinOps teams can catch cost anomalies before they become budget disasters.

View Video

Datadog

Read more about How to Track Cloud Costs in Real-Time Instead of Waiting Days

Operations | Monitoring | ITSM | DevOps | Cloud

Day 2 with Cilium: Small configurations that keep large clusters boring

Python memory profiling: Common pitfalls and how to avoid them

Centrally set up and scale monitoring of your infrastructure and apps with Datadog Fleet Automation

Datadog re:Invent recap 2025

The Hidden Costs and Concerns of Iceberg Maintenance

Improve log utilization with Datadog log exclusion filters | Datadog Tips & Tricks

Training Foundation Models on a Trillion Data Points with Apache Iceberg

Monitor your Kubernetes operators to keep applications running smoothly

From performance to impact: Bridging frontend teams through shared context

How to Track Cloud Costs in Real-Time Instead of Waiting Days

Monthly Archive

Follow Us