%term

The latest News and Information on Application Performance Monitoring and related technologies.

Training Foundation Models on a Trillion Data Points with Apache Iceberg

Dec 16, 2025 By Datadog In Datadog

Training an AI foundation model on over a trillion data points sounds impossible without hitting your production systems. Here's how Datadog did it with Apache Iceberg for their time series forecasting model TOTO. The key challenge: extracting massive historical observability data (metrics spanning years) and running incremental preprocessing pipelines without overwhelming production services. Iceberg solved this by providing schema governance, consistency guarantees, and seamless integration with ML tools like Ray and PyTorch.

View Video

Datadog

Read more about Training Foundation Models on a Trillion Data Points with Apache Iceberg

OpenTelemetry Metrics with 5 Practical Examples

Dec 15, 2025 By Elizabeth Mathew In SigNoz

Picture this, your observability tool already nails the basics like request rates, latency and memory usage, but you need more insight. Think user churn rates, engagement spikes, or even how many carts get abandoned mid-checkout. That’s where OpenTelemetry steps in, providing a way to track those critical custom metrics with ease.

Read Post

SigNoz

Read more about OpenTelemetry Metrics with 5 Practical Examples

How Inkeep Monitors Their AI Agent Framework with SigNoz

Dec 15, 2025 By Anushka Karmakar In SigNoz

AI agents are fundamentally different beasts to monitor compared to traditional applications. A single user request can trigger a cascade of 10+ internal operations: sub-agent transfers, tool executions, LLM calls, API requests, each with unpredictable latency and failure modes. When something goes wrong (and with LLMs, things go wrong in creative ways), you need to see the entire execution flow to debug effectively.

Read Post

SigNoz

Read more about How Inkeep Monitors Their AI Agent Framework with SigNoz

Overcoming ClickHouse's JSON constraints to build a high-performance JSON log store

Dec 15, 2025 By Elizabeth Mathew In SigNoz

Customer logs data is always messy. Being (and building!) an observability platform, we get to see all the beautiful, creative ways it can be messy, every single day. And yet, our customers expect, quite fairly, I might add, perfect query results and peak performance. Info SigNoz is an open-source observability platform that can be your one-stop solution for logs, metrics and traces.

Read Post

SigNoz

Read more about Overcoming ClickHouse's JSON constraints to build a high-performance JSON log store

How to Track Cloud Costs in Real-Time Instead of Waiting Days

Dec 15, 2025 By Datadog In Datadog

Tired of waiting days to see your AWS bill spike? Datadog solved this problem using Apache Iceberg to deliver real-time cloud cost visibility - updating every 15 minutes instead of waiting for billing data. Here's how it works: They sync real-time resource inventory (EC2 instances, Kubernetes pods) into Iceberg tables, then use Trino to join those snapshots with unit pricing data. The result? FinOps teams can catch cost anomalies before they become budget disasters.

View Video

Datadog

Read more about How to Track Cloud Costs in Real-Time Instead of Waiting Days

How Datadog Manages 50,000 Apache Iceberg Tables at Scale

Dec 10, 2025 By Datadog In Datadog

Think managing a few database tables is hard? Try 50,000 production Iceberg tables storing petabytes of data with 8 million scans per day. In this clip, Datadog's platform team reveals the architecture choices behind their managed Iceberg implementation that serves hundreds of internal engineering teams.

View Video

Datadog

Read more about How Datadog Manages 50,000 Apache Iceberg Tables at Scale

Datadog at AWS re:Invent, Bits AI SRE, MCP Server, CloudPrem, and more | This Month in Datadog

Dec 10, 2025 By Datadog In Datadog

Get a closer look at features we announced at AWS re:Invent in the latest episode of This Month in Datadog. Tune in for spotlights of Bits AI SRE, now generally available, and Datadog’s MCP Server, which connects AI agents to our platform by ingesting prompts and mapping them to Datadog resources and data. Plus, we cover how to: This Month in Datadog brings you the latest updates on our newest product features, announcements, resources, and events.

View Video

Datadog

Read more about Datadog at AWS re:Invent, Bits AI SRE, MCP Server, CloudPrem, and more | This Month in Datadog

Datadog on Apache Iceberg

Dec 9, 2025 By Datadog In Datadog

Historically, Datadog has relied on technologies like Snowflake and Apache Spark on raw parquet files (lacking consistent table structure) to power internal analytics and data science at scale. As usage grew across product teams, more features depended on data science teams, and our datasets grew to include more telemetry data, these systems became complex to manage and govern both technically and financially. The need for a more flexible and scalable solution led Datadog to adopt Apache Iceberg, an open source table format for data lakes that brings reliability and performance while remaining SQL-friendly.

View Video

Datadog

Read more about Datadog on Apache Iceberg

Bits AI SRE, our first AI agent, now generally available! #datadog

Dec 4, 2025 By Datadog In Datadog

We introduced Bits AI SRE, our first AI agent, now generally available. Across industries, customers of all sizes are already seeing faster resolution, stronger reliability, and a better on-call experience for their teams.

View Video

Datadog

Read more about Bits AI SRE, our first AI agent, now generally available! #datadog

Adding a CDN to a load balancer (for a much faster website)

Dec 3, 2025 By Denny Mate In Raygun

Here at Raygun, we like to go fast. Really fast. That's what we do! When we see something that isn't zooming, we try to figure out how to make it go faster. So today, we're answering a simple (and relevant) question; how do we make our public site, raygun.com, much, much faster? The answer, at first glance, is simple-we build it into a Content Delivery Network (CDN). But what if you have a load balancer serving your website, and you don't want to rebuild everything to serve from a CDN? Well, that's more complicated. Let's start by describing the issue.

Read Post