Operations | Monitoring | ITSM | DevOps | Cloud

July 2020

Zero instrumentation serverless observability with AWS SAM and CDK integrations

As organizations build out their serverless footprint, they might find themselves managing hundreds or thousands of individual components (e.g., Amazon S3 buckets, Amazon DynamoDB tables, AWS SQS queues) for just a single application. At the same time, performance issues can crop up at any of these points, which means that having access to detailed observability data from your serverless functions is crucial for effective troubleshooting.

Monitor your Windows containers with Datadog

As cloud providers and infrastructure technologies grow their support for Windows containers, developers who use the Windows ecosystem are more and more able to enjoy the benefits of containerization. It’s quicker and easier than ever to modernize and deploy applications that use Windows-specific frameworks like .NET. Plus, Windows developers can use orchestration services like Kubernetes, Amazon ECS, or Docker Swarm to manage the complexity that containerized environments introduce.

Instrument your Python applications with Datadog and OpenTelemetry

If you are familiar with OpenTracing and OpenCensus, then you have probably already heard of the OpenTelemetry project. OpenTelemetry merges the OpenTracing and OpenCensus projects to provide a standard collection of APIs, libraries, and other tools to capture distributed request traces and metrics from applications and easily export them to third-party monitoring platforms.

Stream logs to Datadog with Amazon Kinesis Data Firehose

Amazon Kinesis Data Firehose is a service for ingesting, processing, and loading data from large, distributed sources such as clickstreams into multiple consumers for storage and real-time analytics. AWS recently launched a new Kinesis feature that allows users to ingest AWS service logs from CloudWatch and stream them directly to a third-party service for further analysis.

Introducing the Datadog mobile app

When you’re on call and get paged at an inconvenient time, you need to be able to quickly determine the seriousness of the issue and act decisively to reduce system downtime. But pager notifications often don’t give you the information you need to investigate an issue from your mobile device, meaning that access to a laptop at all times is a must.

Best practices for maintaining end-to-end tests

In Part 1, we looked at some best practices for getting started with creating effective test suites for critical application workflows. In this post, we’ll walk through best practices for making test suites easier to maintain over time, including: We’ll also show how Datadog can help you easily adhere to these best practices to keep test suites maintainable while ensuring a smooth troubleshooting experience for your team.

Datadog API client libraries now available for Java and Go

Client libraries are collections of code that make it easier for developers to write flexible and efficient applications that interface with APIs. Datadog provides client libraries so you can programmatically interact with our API to customize dashboards, search metrics, create alerts, and perform other tasks. We’re pleased to announce that we’ve developed and open-sourced two new client libraries for Java and Go in addition to our existing Ruby and Python libraries.

How Gremlin monitors its own Chaos Engineering service with Datadog

Reliable systems are vital to meeting customer expectations. Downtime not only hurts a company’s bottom line but can be detrimental to reputation. Our goal at Gremlin is to help enterprises build more reliable systems using Chaos Engineering. Whether your infrastructure is deployed on bare metal in a corporate-owned data center or as Kubernetes-orchestrated microservices in a public cloud, chaos experiments can help you find system weaknesses early, before they affect customers.

Introducing the Datadog IoT Agent

From smart thermostats and grocery store checkouts to public utility infrastructures and industrial manufacturing lines, the Internet of Things (IoT) is all around us—and growing larger every day. But with this rapid growth comes a number of operational challenges: IoT devices collect a large amount of data, and are often distributed across harsh, ever-changing environments.

Diagnosing out-of-memory errors on Linux

Out-of-memory (OOM) errors take place when the Linux kernel can’t provide enough memory to run all of its user-space processes, causing at least one process to exit without warning. Without a comprehensive monitoring solution, OOM errors can be tricky to diagnose. In this post, you will learn how to use Datadog to diagnose OOM errors on Linux systems.

Test on-premise applications with Datadog Synthetic private locations

Synthetic monitoring lets you improve end user experience by proactively verifying that they can complete important transactions and access key endpoints. But your applications serve many users, from customers to all the employees who run your business. This makes testing the performance of any internal-facing services within your private network just as critical as monitoring your external-facing applications.

How to Use the Datadog CLI on Kubernetes | Datadog Tips & Tricks

In this video, you’ll learn how to use the Datadog command line interface (CLI) on Kubernetes to perform key tasks, including checking the status of the agent and viewing custom checks. The Datadog Agent CLI allows you to check the status of the Agents running on the pods in your Kubernetes clusters. It also provides various helpful commands, including starting and stopping the agent, viewing configured custom checks, and sending flares to the Datadog support team to automatically open troubleshooting tickets.

Datadog on RocksDB

Datadog is a monitoring and analytics platform that ingests trillions of data points per day, coming from more than 8,000 customers. Each of those is associated with metadata, mostly in the form of tags, and it can also be part of streams of related data points, which can then be explored, queried, or aggregated. RocksDB is used by many services at Datadog that are part of that metrics ingestion, aggregation, query, and index pipeline.

How to Manage Datadog Resources Using Terraform | Datadog Tips & Tricks

Terraform allows you to efficiently manage complex infrastructure environments, and Datadog is an important piece of those environments. With the Datadog provider, you can use Terraform to manage your Datadog resources as code, allowing you to create and edit resources with the same tool you’re already using for your infrastructure. This video will show you how to do just that through the example of creating a Datadog monitor.

How to Build A Unified Dashboard | Datadog Tips & Tricks

In this video, you’ll learn how to create unified dashboards to enable your teams with valuable information and performance visualizations from across the Datadog platform. Dashboards allow your teams to see all data from across the Datadog platform side-by-side, enabling holistic visibility and breaking down silos between Dev and Ops teams. In this video, you’ll learn how to create a Screenboard, showcasing data such as frontend system latency, backend system latency, and Service Level Objectives all in one place.

Identifying Environment Right Sizing Opportunities for Cost Efficiency | Datadog Tips & Tricks

In this video, you’ll learn how to use the host map to identify opportunities to rightsize your environment to become more cost efficient. MoneySuperMarket Group was able to cut their cloud infrastructure costs by over 50% by utilizing Datadog. This video unpacks some of the practices used by MoneySuperMarket’s engineering team to accomplish that.

Monitor Apache Ignite with Datadog

Apache Ignite is a computing platform for storing and processing large datasets in memory. Ignite can leverage hardware RAM as both a caching and storage layer to serve as a distributed, in-memory database or data grid. This allows Ignite to ingest and process complex datasets—such as those from real-time machine learning and analytics systems—in parallel and at faster speeds than traditional databases supported by only disk storage.

Monitor Hazelcast with Datadog

Hazelcast is a distributed, in-memory computing platform for processing large data sets with extremely low latency. Its in-memory data grid (IMDG) sits entirely in random access memory, which provides significantly faster access to data than disk-based databases. And with high availability and scalability, Hazelcast IMDG is ideal for use cases like fraud detection, payment processing, and IoT applications.