Operations | Monitoring | ITSM | DevOps | Cloud

How to Build A Unified Dashboard | Datadog Tips & Tricks

In this video, you’ll learn how to create unified dashboards to enable your teams with valuable information and performance visualizations from across the Datadog platform. Dashboards allow your teams to see all data from across the Datadog platform side-by-side, enabling holistic visibility and breaking down silos between Dev and Ops teams. In this video, you’ll learn how to create a Screenboard, showcasing data such as frontend system latency, backend system latency, and Service Level Objectives all in one place.

Identifying Environment Right Sizing Opportunities for Cost Efficiency | Datadog Tips & Tricks

In this video, you’ll learn how to use the host map to identify opportunities to rightsize your environment to become more cost efficient. MoneySuperMarket Group was able to cut their cloud infrastructure costs by over 50% by utilizing Datadog. This video unpacks some of the practices used by MoneySuperMarket’s engineering team to accomplish that.

Observability at The Edge with Fastly and Datadog

You use CDNs because they allow you to serve content as quickly and reliably as possible. But how well are your systems performing? How securely are you moving data—and how do you know which parts of your environment are slowing you down? Learn how to improve end user experiences, accelerate development, and take full advantage of edge computing in this joint webinar.

Driving Service Reliability Through Autoscaling Workloads on OpenShift

In this webinar, Ara Pulido, Technical Evangelist at Datadog, will demonstrate how to autoscale your application workloads on OpenShift. You will learn frameworks for how to identify their key work and resource metrics; as well as how to use them to drive horizontal and vertical pod autoscaling so that you can maximize efficiency, while ensuring service reliability.

Datadog Application Performance Monitoring

Datadog APM provides deep visibility into application performance and code efficiency, so you can monitor and optimize your stack at any scale and provide the best digital experience for your users. APM and distributed tracing are fully integrated with the rest of Datadog, giving you rich context for troubleshooting issues in real time.

Using Log Patterns to Create Log Exclusion Filters | Datadog Tips & Tricks

In part 2 of this 2 part series, you’ll learn how to use Log Patterns to quickly create log exclusion filters and reduce the number of low-value logs you are indexing. Datadog’s Logging with Limits™ feature allows you to selectively determine which logs to index after ingesting all of your logs. Meanwhile, the Log Patterns feature can quickly isolate groups of low-value logs.

How to Generate Metrics from Logs | Datadog Tips & Tricks

In this video, you’ll learn how to generate metrics using log events attributes to filter your logs more effectively and begin monitoring, graphing and alerting on the new metric immediately. Generating metrics from logs is a powerful tool for monitoring attributes which are parsed from your logs.

Datadog on Kubernetes

When 2 years ago Datadog decided to move its infrastructure platform to Kubernetes we didn’t expect to find so many roadblocks, but ingesting trillions of datapoints per day in a reliable fashion requires pushing the limits of cloud computing. Creating and managing dozens of clusters, with thousands of nodes each and operating in several clouds was a challenging but rewarding learning experience. In this episode Ara Pulido, Developer Advocate, will chat with Laurent Bernaille, Staff Engineer at Datadog and part of the team that created Datadog’s Kubernetes platform. We’ll cover the challenges we found creating and scaling Datadog’s Kubernetes platform and how we overcame them.

Datadog on Kafka

As a company, Datadog ingests trillions of data points per day. Kafka is the messaging persistence layer underlying many of our high-traffic services. Consequently, our Kafka usage is quite high: double-digit gigabytes per second bandwidth and the need for petabytes of high performance storage, even for relatively short retention windows. In this episode, we’ll speak with two engineers responsible for scaling the Kafka infrastructure within Datadog, Balthazar Rouberol and Jamie Alquiza. They'll share their strategy in scaling Kafka, how it’s been deployed on Kubernetes, and introduce kafka-kit; our open source toolkit for scaling Kafka clusters. You'll leave with lessons learned while scaling persistent storage on modern orchestrated infrastructure, and actionable insights you can apply at your organization

Introduction to Site Reliability Engineering

In this session, we start with the basics of SRE, including some common terminology and theory, then dive into practical examples—including lessons learned from our own journey here at Datadog. We discuss the relationship between SRE and DevOps, what success looks like (and how to measure it), and how to identify and nurture both internal and external talent in order to build a cross-functional team. SRE is a large, complex topic, so the session ends with a live Q&A and deep-dive into some great topics.