Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

Monitoring Apache Spark applications running on Amazon EMR

We recently implemented a Spark streaming application, which consumes data from from multiple Kafka topics. The data consumed from Kafka comprises different types of telemetry events generated by mobile devices. We decided to host the Spark cluster using the Amazon EMR service, which manages a fleet of EC2 instances to run our data-processing pipelines.

Introducing the Datadog Cluster Agent

As containers and orchestrators have surged in popularity, they have created highly dynamic environments with rapidly changing workloads—and the need for equally dynamic ways of monitoring them. After all, orchestration technologies like Kubernetes, DC/OS, and Swarm manage container workloads both at the node level and at the cluster level, which means that you need to gather insights from every layer to fully understand the state of your infrastructure.

Track the status of your SLOs with the new monitor uptime widget

Service level objectives are an important tool for maintaining application performance, ensuring a consistent customer experience, and setting expectations about service performance for both internal and external users. We are very pleased to announce the availability of a new monitor uptime widget that makes it simple to monitor the status of your SLOs and communicate that status to your teams, executives, or external customers.

Log Patterns: Automatically cluster your logs for faster investigation

Sifting through all your logs to find what you need can be challenging—especially during an outage, when time is critical and you’re flooded with WARN and ERROR messages. To help you immediately surface useful information from large volumes of logs, we developed Log Patterns.

Pivotal Cloud Foundry Monitoring with Datadog

In part three of this series, we showed you a number of methods and tools for accessing key metrics and logs from a Pivotal Cloud Foundry deployment. Some of these tools help PCF operators monitor the health and performance of the cluster, whereas others allow developers to view metrics, logs, and performance data from their applications running on the cluster.

Collecting Pivotal Cloud Foundry logs and metrics

So far in this series we’ve explored Pivotal Cloud Foundry’s architecture and looked at some of the most important metrics for monitoring each PCF component. In this post, we’ll show you how you can view these metrics, as well as application and system logs, in order to monitor your PCF cluster and the applications running on it.

Key metrics for monitoring Pivotal Cloud Foundry

In the first part of this series, we outlined the different components of a Pivotal Cloud Foundry deployment and how they work together to host and run applications. In this article we will look at some of the most important metrics that PCF operators should monitor. These metrics provide information that can help you ensure that the deployment is running smoothly, that it has enough capacity to meet demand, and that the applications hosted on it are healthy.

Pivotal Cloud Foundry architecture

Pivotal Cloud Foundry (PCF) is a multi-cloud platform for the deployment, management, and continuous delivery of applications, containers, and functions. PCF is a distribution of the open source Cloud Foundry developed and maintained by Pivotal Software, Inc. PCF is aimed at enterprise users and offers additional features and services—from Pivotal and from other third parties—for installing and operating Cloud Foundry as well as to expand its capabilities and make it easier to use.

Log analytics and dashboarding in Datadog

Achieving optimal performance can be challenging when you depend on separate platforms to monitor service health and to manage your logs. When data about your systems is spread across multiple platforms, investigating issues—and ultimately resolving them—takes longer and requires expertise with more tools. It takes more effort to identify real customer impact, as well as to verify that your responses to an incident are having the desired effect.

Datadog APM gains 3 superpowers: Trace Search, Service Map & Watchdog

Since we made Datadog APM generally available last year, we have continually added new features and support for new languages and frameworks to ensure that you can monitor every aspect of application performance. Datadog APM helps companies such as Airbnb, Square, and Zendesk to optimize application performance and deliver top-notch customer experiences.