Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

Monitor ECS applications on AWS Fargate with Datadog

AWS Fargate allows you to run applications in Amazon Elastic Container Service without having to manage the underlying infrastructure. With Fargate, you can define containerized tasks, specify the CPU and memory requirements, and launch your applications without spinning up EC2 instances or manually managing a cluster. Datadog has proudly supported Fargate since its launch, and we have continued to collaborate with AWS on best practices for managing serverless container tasks.

Monitoring Kafka performance metrics

Kafka is a distributed, partitioned, replicated, log service developed by LinkedIn and open sourced in 2011. Basically it is a massively scalable pub/sub message queue architected as a distributed transaction log. It was created to provide “a unified platform for handling all the real-time data feeds a large company might have”.Kafka is used by many organizations, including LinkedIn, Pinterest, Twitter, and Datadog. The latest release is version 2.4.1.

Collecting Kafka performance metrics

If you’ve already read our guide to key Kafka performance metrics, you’ve seen that Kafka provides a vast array of metrics on performance and resource utilization, which are available in a number of different ways. You’ve also seen that no Kafka performance monitoring solution is complete without also monitoring ZooKeeper. This post covers some different options for collecting Kafka and ZooKeeper metrics, depending on your needs.

Monitoring Kafka with Datadog

Kafka deployments often rely on additional software packages not included in the Kafka codebase itself—in particular, Apache ZooKeeper. A comprehensive monitoring implementation includes all the layers of your deployment so you have visibility into your Kafka cluster and your ZooKeeper ensemble, as well as your producer and consumer applications and the hosts that run them all.

Monitor Jenkins jobs with Datadog

Jenkins is an open source, Java-based continuous integration server that helps organizations build, test, and deploy projects automatically. Jenkins is widely used, having been adopted by organizations like GitHub, Etsy, LinkedIn, and Datadog. You can set up Jenkins to test and deploy your software projects every time you commit changes, to trigger new builds upon successful completion of other builds, and to run jobs on a regular schedule.

Datadog's Trace Outliers automatically surfaces error patterns across your environment

When monitoring highly distributed applications, which might rely on hundreds of services and infrastructure components across multiple cloud-based and on-premise environments, identifying problems and pinpointing the origin of an issue can be challenging. Even if you already have robust monitoring and alerts, your infrastructure and applications will likely change over time, which may make it difficult to reliably detect irregular behavior.

Monitor Apache Flink with Datadog

Apache Flink is an open source framework, written in Java and Scala, for stateful processing of real-time and batch data streams. Flink offers robust libraries and layered APIs for building scalable, event-driven applications for data analytics, data processing, and more. You can run Flink as a standalone cluster or use infrastructure management technologies such as Mesos and Kubernetes.

Monitor vSphere with Datadog

VMware vSphere is a server virtualization platform that enables organizations to provision and manage virtual machines at scale. With its comprehensive suite of products, vSphere helps companies manage datacenter resources, migrate workloads without downtime, run applications with high availability, and more. To keep tabs on dynamic vSphere environments and effectively address resource bottlenecks, you need deep visibility across every part of your infrastructure.

Monitor Scylla with Datadog

Scylla is an open source database alternative to Apache Cassandra, built to deliver significantly higher throughput, single-digit millisecond latency, and always-on availability for real-time applications. Unlike Cassandra which is written in Java, Scylla is implemented in C++ to provide greater control over low-level operations and eliminate latency issues related to garbage collection.