Operations | Monitoring | ITSM | DevOps | Cloud

October 2021

Dynamically control your custom metrics volume with Metrics without Limits

Sending custom metrics to Datadog allows you to monitor important data specific to your business and applications, such as latency, dollars per customer, items bought, or trips taken. And tags are key to being able to slice and dice these custom metrics to quickly find the information you need. But collecting enough custom metrics to have complete visibility can be cost prohibitive. For example, you might run microservices instrumented across thousands of containers.

Scaling HashiCorp's Cloud Platform - Dash 2021 (HashiCorp)

Identifying bottlenecks during times of high load is critical to building a scalable software platform. Stress testing is one way to simulate high load on a system and allows you to proactively capture potential bottlenecks before they impact customers. Once a solution is implemented to address the bottleneck, you need a way to measure success and find a new limit. See how HashiCorp Cloud Platform (HCP) has developed a stress testing framework which heavily relies on Datadog’s custom metric capabilities in combination with some out of the box integrations to give HCP engineers a comprehensive view of their platform and how they used these insights to scale their concurrent data-plane provisioning by 300%.

Panel: Handling Incident Response - Dash 2021 (Datadog, PagerDuty)

When customer-impacting downtime happens, it’s crucial that responders are prepared and can resolve these issues as quickly as possible. Knowing the right tools to use, from wherever you are working from, will help to have a well-defined strategy in place to come together as a team, work the problem, and get to a solution quickly. In this roundtable discussion, PagerDuty and Datadog engineers chat about incident responses and how we use all the tools at our disposal to respond quickly and effectively.

Roundtable: The Complexities of Cloud Migration - Dash 2021 (Datadog, LaunchDarkly, StockX)

Often when completing a migration project, you’re having your organisation straddle between two systems. You’re fighting habits and changing attitudes while also attempting to complete a high-risk operation. Every software team at one stage in their career will have to complete a migration. Whether it’s to improve scalability and performance, or transition between an on-prem to cloud solution, you’ll need a deep understanding of your current environment to create a strategy that minimises downtime for your team.

Dash 2021 Keynote

The Datadog team deliver the annual Dash keynote. At Dash 2021, we announced new products and features that give your team even greater visibility into the health and performance of your code, databases, CI/CD pipelines, and more. Now, you can monitor network devices, get visibility into your services' golden signal metrics without touching a single line of code, and integrate third-party tools into our platform with Datadog Apps. We expanded RUM to include iOS error tracking, Session Replay, and Watchdog Insights. And we introduced Datadog Observability Pipelines, which run on your infrastructure and put you in control of your observability data, from how it’s processed to where it’s sent.

Panel: Improving Monitoring & Reliability with Chaos Engineering - Dash 2021 (Datadog,Gremlin,Pismo)

Monitoring and observability are critical for knowing how your systems are behaving, but how do you create the feedback loops to shift from reactive monitoring for incidents to proactively preventing them? In this roundtable discussion Mauricio Galdieri, Software Architect at Pismo.io and Kolton Andrus, CEO and co-founder of Gremlin join Tay Nishimura, Site Reliability Engineer on the Chaos Engineering team at Datadog to chat about monitoring, Chaos Engineering, and using them together to build more reliable systems.

Monitor NS1 with Datadog

NS1 is an intelligent DNS and traffic management platform that helps optimize the performance of your network infrastructure and speed application delivery to your end users. Since even a small increase in service latency can lead to churn and revenue loss, it’s critical to remove any inefficiencies embedded in basic network functions. NS1 helps ensure high performance for name resolution and routing through support for the edns0-client-subnet (ECS) DNS extension and for Filter Chain technology.

Streaming Auth0 Logs to Datadog | Sivamuthu Kumar (Computer Enterprises, Inc.)

Are you using Auth0 in your application for user logins? How will you monitor the Auth0 logs and detect user actions that could indicate security concerns? In this session, we will see how Datadog helps you to extend security monitoring by analyzing Auth0 User activities in the logs. And also we will see how to set up threat detection rules to trigger notifications automatically based on them.

Maintaining Operational Sanity Across 100+ AWS Accounts | Eric Mann / Ryan Tomac (Vacasa)

At Vacasa, AWS accounts represent the unit of isolation for distinct applications & services in our software ecosystem, providing security benefits and operational autonomy for our teams as we scale. Managing accounts at this scale requires strong DevOps practices to maintain security, operational sanity, and uniform observability across the system. In this talk, we’ll cover the benefits of such an approach, the practices that make it possible, and the important role Datadog plays.

Democratizing Delivery: Seamless Observability for Optimal Application Performance |Ekim Maurer(NS1)

When application delivery performance issues happen, observability is critical to diagnosing the problem at hand. The adage “it’s always DNS” means that observability must extend to the foundational layers of the application delivery and access networking stacks. Yet granting administrative access to core network services like DNS and DHCP may run contrary to an organization’s least-privileged access policies. In this session, attendees will learn how global internet companies and enterprises use NS1 and Datadog to provide democratized DNS observability and reach optimal application performance.

Observability for Service Organizations | Bart Scheltinga (RawWorks)

Observability is trending. Organizations that rely on cloud infrastructure and cloud applications prioritize observability initiatives to get control over their business’s applications. At the same time, we see the “gap” between the on-premises infrastructure and “non-cloud” infrastructure is becoming bigger. Examples are End User Computing (EUC) and Global networks (SD-WAN).

Metrics for Apache Kafka with Datadog and Aiven | Ryan Martin (Aiven)

Using managed services is all very well, but how do you get the data you need from the different services into Datadog so you can see it all in one place? This session will walk through the configuration for bringing your Aiven-managed Apache Kafka service metrics into your Datadog explorer. You’ll see how to filter the metrics to focus on specific topics or consumer groups, and how to use the Aiven client to create a repeatable, scriptable setup. This session is recommended for anyone living in the as-a-Service world who cares about data and is interested in using metrics to optimize their Kafka clusters.

Monitoring Open Source Success in Arduino | Silvano Cerza (Arduino)

Arduino is an open-source hardware and software company, project, and user community that designs and manufactures single-board microcontrollers and microcontroller kits for building digital devices. In the course of developing software downloaded and used by millions around the world, we have found it vitally important to be aware of the quality and performance of our software.

Use funnel analysis to understand and optimize key user flows

Monitoring frontend performance and user behavior is essential to ensure that your application is functioning optimally. Datadog RUM enables you to collect key user data and correlate all of it with frontend performance metrics to track how your pages’ performance affects user behavior.

Improve your on-call experience with Datadog mobile dashboard widgets

Life happens—even when you’re on-call. You can’t take your laptop everywhere, but whether you’re on the train, at dinner, or at the gym, you can count on the Datadog mobile app for access to key data about the status and performance of your applications. Now, you can use Datadog mobile widgets to build an on-call mobile dashboard directly on your phone’s home screen, so it’s even easier to track the data you care about from anywhere.

Historical log analysis and investigation with Online Archives

To have full visibility into modern cloud environments, businesses need to collect an ever-growing avalanche of log data from a range of highly complex data sources. Indexing logs is key for real-time monitoring and troubleshooting, but it can quickly become expensive at high volumes, meaning that organizations often must choose which logs to index and which to archive.

Extend your Datadog functionality with Datadog Apps

Last year, we launched the Datadog Marketplace, which lets Datadog partners develop and trade applications that provide custom monitoring solutions for specific use cases. Now, we’re pleased to announce Datadog Apps, which introduces even further customizability to the Datadog platform. With Datadog Apps, you can now build and share your own Datadog UI features that seamlessly combine functionality from your third-party tools with the full range of Datadog’s monitoring capabilities.

Introducing Network Device Monitoring

For many organizations, the success of their business depends on their ability to maintain on-prem or hybrid infrastructure. For instance, some companies rely on data centers for security reasons or to support their large, static workloads, while others must execute their critical business processes as close to the edge as possible to ensure minimal latency.

Dash 2021: Guide to Datadog's newest announcements

Today at Dash 2021, we announced new products and features that give your team even greater visibility into the health and performance of your code, databases, CI/CD pipelines, and more. Now, you can monitor network devices, get visibility into your services' golden signal metrics without touching a single line of code, and integrate third-party tools into our platform with Datadog Apps. We expanded RUM to include iOS error tracking, Session Replay, and Watchdog Insights.

Monitor your CircleCI environment with Datadog

Datadog CI Visibility provides a unified platform for monitoring your CI/CD pipelines. Now, we are partnering with CircleCI to extend that same critical visibility to your CircleCI environment. Datadog’s integration uses CircleCI webhooks to capture information about the status and performance of your workflows and associated jobs, such as a job’s duration and whether or not it failed or was canceled.

Explore Azure App Service with the Datadog Serverless view

Azure App Service is a platform-as-a-service (PaaS) offering for deploying applications to the cloud without worrying about infrastructure. App Service’s “serverless” approach removes the need to provision or manage the servers that run your applications, which provides flexibility, scalability, and ease of use. However, App Service also introduces infrastructure-like considerations that can impact performance and costs.

Resolve AWS Lambda function failures faster by monitoring invocation payloads

In a serverless application, AWS Lambda functions are typically invoked by JSON-formatted events from other AWS services—like API Gateway, S3, and DynamoDB—and respond with JSON-formatted payloads. Having visibility into these function request and response payloads can provide context around your function invocations and help you uncover the root causes of Lambda function failures.

Filter dashboards faster with template variable available values

Datadog’s template variables help you quickly scope your dashboards to specific contexts using tags, so you can visualize data from only the hosts, containers, services, or any other tagged objects you care about. This helps you build more flexible dashboards so you can access the insights you’re looking for as quickly as possible. We’re proud to announce new features for the template variable workflow that enable you to make highly dynamic, shareable dashboards more efficiently.

Debug iOS crashes efficiently with Datadog RUM

Unsurprisingly, application crashes due to fatal errors can be a major pain point for iOS users. Recent research shows that roughly 20 percent of mobile application uninstalls were due to crashes or other code errors. As a developer, it’s paramount to manage this potential churn by capturing comprehensive crash data in order to track, triage, and debug recurring issues in your iOS apps.