Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Application Performance Monitoring and related technologies.

Datadog on Apache Iceberg

Historically, Datadog has relied on technologies like Snowflake and Apache Spark on raw parquet files (lacking consistent table structure) to power internal analytics and data science at scale. As usage grew across product teams, more features depended on data science teams, and our datasets grew to include more telemetry data, these systems became complex to manage and govern both technically and financially. The need for a more flexible and scalable solution led Datadog to adopt Apache Iceberg, an open source table format for data lakes that brings reliability and performance while remaining SQL-friendly.
Sponsored Post

Adding a CDN to a load balancer (for a much faster website)

Here at Raygun, we like to go fast. Really fast. That's what we do! When we see something that isn't zooming, we try to figure out how to make it go faster. So today, we're answering a simple (and relevant) question; how do we make our public site, raygun.com, much, much faster? The answer, at first glance, is simple-we build it into a Content Delivery Network (CDN). But what if you have a load balancer serving your website, and you don't want to rebuild everything to serve from a CDN? Well, that's more complicated. Let's start by describing the issue.

Optimize Your Oracle Cloud (OCI) Spend with Datadog Cloud Cost Management

Support for Oracle Cloud Infrastructure (OCI) is now live in Datadog Cloud Cost Management. In this short demo, you’ll learn how to: Get granular visibility into OCI cost and usage—by service, compartment, tag, and resource tier. Uncover savings opportunities by combining cost data with observability metrics like CPU, memory, and storage utilization. Set up anomaly monitors and budgets to avoid cost overruns—especially for high-risk workloads like AI and GPU training.

Datadog Bits AI SRE: Your new teammate for on-call shifts

Bits AI SRE is an always-on SRE agent built to handle complex troubleshooting and late-night alerts. Developed against thousands of real-world incidents and powered by Datadog’s platform, Bits AI SRE analyzes your entire stack, tests hypotheses, and identifies root causes in minutes. Resolve faster, get back to sleep sooner, and give your on-call team the confidence and capacity they need.

Patterns for Deploying OpenTelemetry Collector at Scale

So, you've embraced OpenTelemetry, and it's been great. Pat, Pat. That single, vendor-neutral pipeline for your traces, metrics, and logs felt like the future. But now, the future is getting bigger. That simple OTel Collector configuration that worked perfectly for a few services is starting to show its limits as you scale. The data volume is climbing, reliability is becoming a concern, and you're wondering if that single collector instance is now a bottleneck waiting to happen.

Amazon AppStream 2.0 Multi-session Service Monitoring

In late 2023, Amazon introduced the ability to deliver AppStream 2.0 using Microsoft Windows Server OS rather than the desktop of the OS. This feature enables IT admins to host multiple end-user sessions on a single AppStream 2.0 instance, helping to make better use of instance resources.