Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

Use formulas and functions in RUM monitors for high-value alerts

Real User Monitoring (RUM) gives you visibility into the behavior of your users and the performance of your applications. You may already be using RUM monitors to automatically notify your team when the number of RUM events—such as pageviews, clicks, or errors—rises above a threshold you define.

Explore a centralized view into service telemetry, Error Tracking, SLOs, and more

When your service is undergoing performance issues, it is essential to address them in a timely and frictionless manner. With access to more telemetry and insights, the APM Service Page provides a comprehensive overview of your service and helps you quickly drill down under the hood to diagnose and investigate issues.

Troubleshoot directly from any replay with Browser Dev Tools

Session Replay now includes Browser Dev Tools, a new feature that enables engineers to identify and debug the root causes of issues even faster by exposing key information about a playback session, such as network performance bottlenecks and any console log errors. This wealth of surrounding context will make it easier to trace frontend incidents throughout your application and remediate larger, ongoing issues.

Successfully migrate to Azure with the Microsoft Cloud Adoption Framework and Datadog

Migrating your applications from on-prem infrastructure to the cloud comes with a number of benefits, including increased agility, resilience, and scalability, as well as potential cost and IT overhead reductions. But it can be complex, which is why organizations moving to Azure often use Microsoft’s Cloud Adoption Framework for Azure and its strategy for successful migrations.

Monitor your Redis Enterprise clusters with Datadog

Redis is an in-memory key-value data store that offers fast performance, flexible data structures, and multi-model databases, allowing it to handle a variety of use cases. Redis Enterprise enhances open source Redis with features designed to run distributed applications at scale, such as multi-tenancy, tiered data storage, active-active cluster replication, and support for up to five 9s of availability.

Accelerate incident investigations with Log Anomaly Detection

Modern DevOps teams that run dynamic, ephemeral environments (e.g., serverless) often struggle to keep up with the ever-increasing volume of logs, making it even more difficult to ensure that engineers can effectively troubleshoot incidents. During an incident, the trial-and-error process of finding and confirming which logs are relevant to your investigation can be time consuming and laborious. This results in employee frustration, degraded performance for customers, and lost revenue.

Troubleshoot faster with improved Datadog Events

Datadog Events provides customers with a data feed about their infrastructure and applications, delivering an up-to-the-minute history of activity such as code deployments, configuration changes, and triggered alerts. Events collects data from Datadog products and over 100 third-party integrations—including Docker, Jenkins, Kubernetes, Sentry, AWS CloudWatch, and Azure Service Health.

Monitor your gRPC APIs with Datadog Synthetic Monitoring

gRPC is an open source Remote Procedure Call (RPC) framework developed by Google and released in 2016. Although gRPC is still relatively new, large organizations are adopting it in increasing numbers to build APIs to connect complex microservice meshes that use disparate languages and frameworks. gRPC-based APIs can process requests up to seven times faster than REST APIs, and they also allow customers to easily implement SSL authentication, load balancing, and tracing via plug-in libraries.

Debug issues and automate remediation with Shoreline and Datadog

Shoreline is an incident response automation service that enables DevOps engineers and site reliability engineers (SREs) to quickly debug and remediate issues at scale and develop automated routines for incident management. Using Shoreline’s proprietary Op language, customers can run debug commands across all their hosts simultaneously and then deploy custom scripts via Actions to trigger automated remediations.