Operations | Monitoring | ITSM | DevOps | Cloud

Datadog

Best practices to prevent alert fatigue

As your environment changes, new trends can quickly make your existing monitoring less accurate. At the same time, building alerts after every new incident can turn a straightforward strategy into a convoluted one. Treating monitoring as a one-time or reactive effort can both result in alert fatigue. Alert fatigue occurs when an excessive number of alerts are generated by monitoring systems or when alerts are irrelevant or unhelpful, leading to a diminished ability to see critical issues.

Identify and resolve incidents faster with InsightFinder's offering in the Datadog Marketplace

InsightFinder is a SaaS platform that uses AI-backed predictive analytics to predict and prevent production incidents. Using InsightFinder with Datadog, you can quickly identify hidden correlations in your application metrics, logs, and events and address application issues before they devolve into production outages and create customer impact.

Best practices for continuous testing with Datadog

In Parts 1 and 2, we looked at how you can build and maintain effective test suites. These steps are a key part of ensuring that application workflows function as expected. But how you run your tests is another important point to consider, so in this post, we’ll walk through best practices for executing your tests across every stage of development. Along the way, we’ll also look at how Datadog supports these practices for the applications that you are already monitoring.

Datadog on Building an Event Storage System

When Datadog introduced its Log Management product, it required a new event data storage platform, as storing logs and events is a completely different problem from storing metrics, which was the first Datadog product. Over time, Datadog introduced more and more products that needed to store and index multi-kilobyte timeseries “events”, re-using the Event Platform infrastructure from Log Management. The increased use of the Event Platform and the new feature requirements coming from new products started exposing the limitations of the legacy system and the need for a new approach

Use HiveMQ and OpenTelemetry to monitor IoT applications in Datadog

Large IoT environments are highly complex and comprise multiple layers of disparate devices that must move data between each other, across potentially unreliable connections. Having visibility into each layer of your IoT environment is critical for quickly identifying problems with your deployment that could negatively impact user experience.

How OpenTelemetry Powers Observability @ Canva

Canva is an online design platform with a mission to empower everyone in the world to design anything and publish anywhere. To guarantee our customers have the best experience using our products, Canva engineers rely on the tools and products provided by the Observability team to measure and quantify critical application health and performance metrics. Canva’s Observability team uses OpenTelemetry components to collect, transform and export standardised telemetry data from our applications and platforms. Canva has been an early adopter of OTel using OTel SDK for tracing and the collector gateway to process and export telemetry to various tools.

Watchdog: AI Across the Datadog Platform

Watchdog is Datadog’s AI engine, providing you with automated alerts, insights, and root cause analyses that draw from observability data across the entire Datadog platform. Watchdog continuously monitors your infrastructure and surfaces the signals that matter most, helping you quickly detect, troubleshoot, and resolve issues. Plus, all Watchdog features come built in—no setup required.

Container Monitoring Demo

Datadog Container Monitoring gives you real-time, end-to-end visibility into your containerized environments. In this demo, we show you how Container Monitoring helps you correlate container metrics with logs, traces, and network data to quickly detect and investigate anomalies across every layer of your Kubernetes clusters. We also walk you through setting up AI-enhanced monitors to receive automatic alerts for future issues.

Configure pipeline alerts with Datadog CI monitors

CI pipelines have become an integral part of the development workflow, helping teams automate the continuous building and testing of new updates to application code. The growing importance of CI pipelines has naturally led to a need for increased visibility into their performance. In 2021, Datadog introduced CI Visibility to deliver granular performance metrics for each individual pipeline, allowing you to monitor build duration and related telemetry across all recent commits.