Operations | Monitoring | ITSM | DevOps | Cloud

Latest Videos

Datadog on Building an Event Storage System

When Datadog introduced its Log Management product, it required a new event data storage platform, as storing logs and events is a completely different problem from storing metrics, which was the first Datadog product. Over time, Datadog introduced more and more products that needed to store and index multi-kilobyte timeseries “events”, re-using the Event Platform infrastructure from Log Management. The increased use of the Event Platform and the new feature requirements coming from new products started exposing the limitations of the legacy system and the need for a new approach

How OpenTelemetry Powers Observability @ Canva

Canva is an online design platform with a mission to empower everyone in the world to design anything and publish anywhere. To guarantee our customers have the best experience using our products, Canva engineers rely on the tools and products provided by the Observability team to measure and quantify critical application health and performance metrics. Canva’s Observability team uses OpenTelemetry components to collect, transform and export standardised telemetry data from our applications and platforms. Canva has been an early adopter of OTel using OTel SDK for tracing and the collector gateway to process and export telemetry to various tools.

Watchdog: AI Across the Datadog Platform

Watchdog is Datadog’s AI engine, providing you with automated alerts, insights, and root cause analyses that draw from observability data across the entire Datadog platform. Watchdog continuously monitors your infrastructure and surfaces the signals that matter most, helping you quickly detect, troubleshoot, and resolve issues. Plus, all Watchdog features come built in—no setup required.

Container Monitoring Demo

Datadog Container Monitoring gives you real-time, end-to-end visibility into your containerized environments. In this demo, we show you how Container Monitoring helps you correlate container metrics with logs, traces, and network data to quickly detect and investigate anomalies across every layer of your Kubernetes clusters. We also walk you through setting up AI-enhanced monitors to receive automatic alerts for future issues.

Architecting for Reliability

As modern systems become increasingly more complex, the risk of incidents and outages increases. Old approaches to reliability can sometimes be adapted to novel system designs, but other times new methods need to be invented. In this panel session moderated by Datadog’s Jason Yee, you’ll hear from SRE leaders and systems architects across the industry about how they’re designing and operating systems to achieve greater reliability.

Democratizing Observability

DevOps principles have helped many organizations improve cross-team collaboration, which has in turn led to increased reliability and velocity in the development lifecycle. In this session moderated by Jason Yee, we hear from panelists who have applied these same DevOps principles to observability, helping them unlock data-based insights and empower teams to make smarter, more informed decisions.