Operations | Monitoring | ITSM | DevOps | Cloud

Protect agentic AI applications with Datadog AI Guard

Organizations are increasingly using agentic AI applications powered by large language models (LLMs) to automate analysis, decision-making, and operational workflows. As these AI agents take on more responsibility, they gain access to internal tools and services and can interact with them in unintended ways.

How Okta keeps 99.99 percent uptime with #datadog

How do you maintain 99.99 percent uptime across thousands of Kubernetes hosts and multiple cloud providers? Okta engineers explain why observability is critical to keeping authentication and authorization services running at scale. Watch how Okta uses Datadog to bring metrics, logs, and traces into a single view, speed up root cause analysis, and reduce time to mitigation while controlling costs.

Debug PostgreSQL query latency faster with EXPLAIN ANALYZE in Datadog Database Monitoring

In PostgreSQL, the EXPLAIN ANALYZE statement gives you a detailed report of what actually happens when you execute a query. This kind of information is important for troubleshooting slow queries, but using EXPLAIN ANALYZE to collect this data is often challenging in a production environment. Datadog Database Monitoring now supports automatic collection of EXPLAIN ANALYZE plans for PostgreSQL, enabling you to easily capture execution details that help you troubleshoot slow queries.

Datadog acquires Propolis

Generative AI enables teams to write and ship code faster than ever. But current methods for testing and quality assurance have not evolved to match the new pace and scale of deployments. Manual and deterministic testing paths quickly become obsolete when new features are released, and they fundamentally can’t test AI outputs, leaving a massive untested surface area. To keep up, teams need new testing methods that can define what goals users have, and ensure that their outcomes match.

Unify and correlate frontend and backend data with retention filters

Teams can use Datadog Real User Monitoring (RUM) and RUM without Limits to get full visibility into the frontend health of their applications while retaining only the sessions that contain critical problems that affect the end-user experience. But application errors or slowness often result from backend issues, such as database bottlenecks. To diagnose these issues, you need to correlate the frontend data from RUM with the backend data from Datadog Application Performance Monitoring (APM).

Easily Map Logs to OCSF with Datadog Observability Pipelines

Normalizing security logs into the Open Cybersecurity Schema Framework (OCSF) is often complex, manual, and time-consuming. With Datadog Observability Pipelines, you can easily transform logs into OCSF format—right in your own environment—before routing them to destinations like Splunk, CrowdStrike, and AWS Security Lake. This video show how Security teams can use Observability Pipelines to: Collect, process, and transform logs into OCSF format automatically.

Monitor Arista VeloCloud SD-WAN performance with Datadog

As organizations grow their cloud environments and branch office networks, maintaining reliable connectivity and application performance becomes more complex. VeloCloud SD-WAN provides dynamic, policy-based routing to help ensure that your connectivity is dependable and cost-efficient, and that your applications perform consistently.

Building reliable dashboard agents with Datadog LLM Observability

This article is part of our series on how Datadog’s engineering teams use LLM Observability to iterate, evaluate, and ship AI-powered agents. In this first story, the Graphing AI team shares how they instrumented their widget- and dashboard-generation agents with LLM Observability to detect regressions and debug failures faster. Visibility into how large language model (LLM) applications behave in real time is essential for building reliable AI-driven systems at Datadog.