Operations | Monitoring | ITSM | DevOps | Cloud

How OpenTelemetry Powers Observability @ Canva

Canva is an online design platform with a mission to empower everyone in the world to design anything and publish anywhere. To guarantee our customers have the best experience using our products, Canva engineers rely on the tools and products provided by the Observability team to measure and quantify critical application health and performance metrics. Canva’s Observability team uses OpenTelemetry components to collect, transform and export standardised telemetry data from our applications and platforms. Canva has been an early adopter of OTel using OTel SDK for tracing and the collector gateway to process and export telemetry to various tools.

Watchdog: AI Across the Datadog Platform

Watchdog is Datadog’s AI engine, providing you with automated alerts, insights, and root cause analyses that draw from observability data across the entire Datadog platform. Watchdog continuously monitors your infrastructure and surfaces the signals that matter most, helping you quickly detect, troubleshoot, and resolve issues. Plus, all Watchdog features come built in—no setup required.

Container Monitoring Demo

Datadog Container Monitoring gives you real-time, end-to-end visibility into your containerized environments. In this demo, we show you how Container Monitoring helps you correlate container metrics with logs, traces, and network data to quickly detect and investigate anomalies across every layer of your Kubernetes clusters. We also walk you through setting up AI-enhanced monitors to receive automatic alerts for future issues.

Architecting for Reliability

As modern systems become increasingly more complex, the risk of incidents and outages increases. Old approaches to reliability can sometimes be adapted to novel system designs, but other times new methods need to be invented. In this panel session moderated by Datadog’s Jason Yee, you’ll hear from SRE leaders and systems architects across the industry about how they’re designing and operating systems to achieve greater reliability.

Democratizing Observability

DevOps principles have helped many organizations improve cross-team collaboration, which has in turn led to increased reliability and velocity in the development lifecycle. In this session moderated by Jason Yee, we hear from panelists who have applied these same DevOps principles to observability, helping them unlock data-based insights and empower teams to make smarter, more informed decisions.

Dash Panel Discussion: What Users Really Want

Measuring user experience is typically done by tracking metrics like latency and purchase frequency. But these metrics can often obscure real user sentiment. In this panel session moderated by Miranda Kapin, you can learn about better ways to uncover how users are truly experiencing your application and methods for improving their engagement.

Building a Multi-Tenant Insurance Platform

In 2020, CoverWallet—a multi-tenant insurance platform—was acquired by Aon, which led to a rapid expansion in both the size and global presence of its engineering organization. In his talk, CoverWallet’s Hylke Alons walks through the changes that were necessary to meet their platform's new expectations, including improving growth and scalability while ensuring reliability, automating security, and reducing maintenance. He also discusses some best practices for scaling up engineering and product teams to handle demand in a complex and highly regulated industry like insurance.