%term

How we cut Spark compute costs by 44% with agentic AI and Datadog Jobs Monitoring

Jun 1, 2026 By Charles Yu In Datadog

Spark jobs only get more expensive and harder to debug as they scale. It’s a problem we’ve run into ourselves. Our Referential Data Platform team builds and maintains the knowledge graph that maps relationships between customers’ observability entities. ServiceQueryEdge is at the center of that graph, mapping service entities to their associated metric and log queries.

Read Post

Datadog

Read more about How we cut Spark compute costs by 44% with agentic AI and Datadog Jobs Monitoring

A deep dive into AWS data perimeter misconfigurations

Jun 1, 2026 By Mallory Mooney In Datadog

In AWS environments, a data perimeter is a set of preventative controls that help ensure that your trusted cloud identities (principals or AWS services acting on your behalf) are accessing trusted resources from authorized networks. You can apply these controls at various levels of your infrastructure, such as per resource or across all resources in your AWS account.

Read Post

Datadog

Read more about A deep dive into AWS data perimeter misconfigurations

Monitor LLM routing with the Kubernetes Inference Extension

May 29, 2026 By David Lentz In Datadog

If you serve LLMs on Kubernetes without inference-aware routing, your load balancer is likely wasting inference capacity. Generic HTTP traffic management blindly routes requests, assuming the backends in your cluster are interchangeable. But your model-serving backends are stateful and unevenly prepared to handle any given request. As a result, requests are often routed to the backend that’s not the one best suited to respond.

Read Post

Datadog

Read more about Monitor LLM routing with the Kubernetes Inference Extension

How a unified data model improves feature flag rollout decisions

May 29, 2026 By Bridgitte Kwong In Datadog

Consolidation is reshaping the experimentation and feature management landscape. Tools are merging, and partnerships are being repackaged as platforms. But marketing a unified experience is not the same as building one. Right now, engineering leaders and product managers are reassessing whether the tools they depend on are built for the long term. It’s irrelevant which vendor has the most products.

Read Post

Datadog

Read more about How a unified data model improves feature flag rollout decisions

Monitor JavaScript framework routing with Datadog RUM

May 28, 2026 By Datadog In Datadog

Modern web applications rely on frameworks like Next.js, Vue, and Angular to handle routing and rendering. In these architectures, navigation happens within the application rather than through full page loads, which makes it difficult for traditional browser instrumentation to capture what users actually experience. As a result, teams often see misleading view names, missing navigations, and errors that are either misattributed or not captured at all, especially during hydration or lazy loading.

Read Post

Datadog

Read more about Monitor JavaScript framework routing with Datadog RUM

Monitor Azure Managed Redis with Datadog

May 28, 2026 By Michael Cronk In Datadog

Azure Managed Redis is Microsoft’s fully managed, enterprise-tier in-memory data store. It is designed for the low-latency caching, session storage, and real-time data needs of modern applications, including AI workloads that depend on fast vector and embedding lookups. Because user-facing applications often query Redis directly, even small regressions in latency, hit rate, or memory pressure can degrade the user experience.

Read Post

Datadog

Read more about Monitor Azure Managed Redis with Datadog

Deploy Datadog Kubernetes Autoscaling at scale

May 28, 2026 By Danny Driscoll In Datadog

Every Kubernetes environment accumulates waste over time. Teams overprovision CPU and memory requests to avoid performance risk, run idle replicas to preserve headroom, and leave Horizontal Pod Autoscalers (HPAs) untouched long after workload behavior has changed. Some of this waste can be addressed at the node level, where Datadog Cluster Autoscaling helps teams rightsize capacity.

Read Post

Datadog

Read more about Deploy Datadog Kubernetes Autoscaling at scale

Unified observability for Alibaba Cloud with Datadog

May 28, 2026 By Ellie Cohen In Datadog

Alibaba Cloud is a major cloud provider in APAC, offering industry-leading foundational AI models in addition to compute, managed databases, object storage, and Kubernetes through its Container Service for Kubernetes (ACK). Teams choose Alibaba Cloud for its infrastructure availability across Asia Pacific and its managed services. For SREs and platform engineers, that often means running Alibaba Cloud alongside AWS, Google Cloud, or Microsoft Azure.

Read Post

Datadog

Read more about Unified observability for Alibaba Cloud with Datadog

Instrument LangGraph agents with Datadog: a practical guide

May 28, 2026 By Datadog In Datadog

AI agents tend to function as black boxes, and it can be difficult to trace and understand agent workflows end-to-end in order to characterize performance. Particularly, you need visibility into the following: By tracing full agent runs with LLM Observability, Datadog AI Agent Monitoring enables you to visualize workflows with flame graphs and quickly spot sources of failures and latency.

Read Post

Datadog

Read more about Instrument LangGraph agents with Datadog: a practical guide

Investigate funnel drop-offs with Product Analytics

May 27, 2026 By Datadog In Datadog

For most product teams, funnels are a staple of the analytics toolkit despite a frustrating limitation. You can see which step users are dropping off at, but understanding why requires hours of manual slicing across segments, separate comparison views, and a lot of trial and error before you land on a useful hypothesis. And even when you find something meaningful, taking action typically means jumping to another tool, building a new segment, or filing a request with a data team.

Read Post

Datadog

Read more about Investigate funnel drop-offs with Product Analytics

Operations | Monitoring | ITSM | DevOps | Cloud

How we cut Spark compute costs by 44% with agentic AI and Datadog Jobs Monitoring

A deep dive into AWS data perimeter misconfigurations

Monitor LLM routing with the Kubernetes Inference Extension

How a unified data model improves feature flag rollout decisions

Monitor JavaScript framework routing with Datadog RUM

Monitor Azure Managed Redis with Datadog

Deploy Datadog Kubernetes Autoscaling at scale

Unified observability for Alibaba Cloud with Datadog

Instrument LangGraph agents with Datadog: a practical guide

Investigate funnel drop-offs with Product Analytics

Monthly Archive

Follow Us