Operations | Monitoring | ITSM | DevOps | Cloud

Optimize PostgreSQL performance with Datadog Database Monitoring

PostgreSQL is a widely used open source relational database that many organizations operate as a core part of their infrastructure stack. Because of their mission-critical nature, database-related issues can have outsize downstream impacts on user experience, service performance, and data retention, making it vital to identify and address problems quickly.

Create Golden Paths for your development teams with Datadog App Builder and Workflow Automation

Improving the developer experience is a chief concern for many orgs who must maintain highly complex software architectures and platforms supported by an intricate web of internal processes. Platform engineering for Golden Paths seeks to address this by providing self-service tools, capabilities, and processes to help engineers start new projects in a more standardized, less mistake-prone way.

DASH 2024: Guide to Datadog's newest announcements

At this year’s DASH, we announced new products and features that enable your team to observe your environment, secure your infrastructure and workloads, and act to remediate problems before they affect customers. LLM Observability, which enables you to get deep visibility into your generative AI applications, is now generally available. The Datadog Agent now includes an embedded OTel Collector to provide native support for OpenTelemetry.

Unify your OpenTelemetry and Datadog experience with the embedded OTel Collector in the Agent

OpenTelemetry (OTel) is an open source, vendor-neutral observability solution that consists of a suite of components—including APIs, SDKs, and the OTel Collector—that allow teams to monitor their applications and services in a standardized format. OTel defines this data via the OpenTelemetry Protocol (OTLP), a standard for the encoding and transfer of telemetry data that organizations can use to collect, process, and export telemetry and route it to observability backends, such as Datadog.

Monitor, troubleshoot, improve, and secure your LLM applications with Datadog LLM Observability

Organizations across all industries are racing to adopt LLMs and integrate generative AI into their offerings. LLMs have been demonstrably useful for intelligent assistants, AIOps, and natural language query interfaces, among many other use cases. However, running them in production and at an enterprise scale presents many challenges.

Track the status of all your SLOs in Datadog

Service level objectives, or SLOs, are a key part of the site reliability engineering toolkit. SLOs provide a framework for defining clear targets around application performance, which ultimately help teams provide a consistent customer experience, balance feature development with platform stability, and improve communication with internal and external users.

Best practices for managing your SLOs with Datadog

Collaboration and communication are critical to the successful implementation of service level objectives. Development and operational teams need to evaluate the impact of their work against established service reliability targets in order to improve their end user experience. Datadog simplifies cross-team collaboration by enabling everyone in your organization to track, manage, and monitor the status of all of their SLOs and error budgets in one place.

SLOs 101: How to establish and define service level objectives

In recent years, organizations have increasingly adopted service level objectives, or SLOs, as a fundamental part of their site reliability engineering (SRE) practice. Best practices around SLOs have been pioneered by Google—the Google SRE book and a webinar that we jointly hosted with Google both provide great introductions to this concept. In essence, SLOs are rooted in the idea that service reliability and user happiness go hand in hand.

Troubleshoot infrastructure faster with Recent Changes

Infrastructure changes often trigger incidents, but troubleshooting these incidents is challenging when responders have to navigate through multiple tools to correlate telemetry with configuration changes. This lack of unified observability leads to longer mean time to resolution (MTTR), greater operational stress, and ultimately, negative business outcomes.