Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

Monitoring RabbitMQ performance with Datadog

In Part 2 of this series, we’ve seen how RabbitMQ ships with tools for monitoring different aspects of your application: how your queues handle message traffic, how your nodes consume memory, whether your consumers are operational, and so on. While RabbitMQ plugins and built-in tools give you a view of your messaging setup in isolation, RabbitMQ weaves through the very design of your applications.

Collecting metrics using RabbitMQ monitoring tools

While the output of certain RabbitMQ CLI commands uses the term “slave” to refer to mirrored queues, RabbitMQ has disavowed this term, as has Datadog. When collecting RabbitMQ metrics, you can take advantage of RabbitMQ’s built-in monitoring tools and ecosystem of plugins. In this post, we’ll introduce these RabbitMQ monitoring tools and show you how you can use them in your own messaging setup.

Key metrics for RabbitMQ monitoring

RabbitMQ is a message broker, a tool for implementing a messaging architecture. Some parts of your application publish messages, others consume them, and RabbitMQ routes them between producers and consumers. The broker is well suited for loosely coupled microservices. If no service or part of the application can handle a given message, RabbitMQ keeps the message in a queue until it can be delivered.

Easily add tags and metadata to your services using the simplified Service Catalog setup

Modern applications running on distributed systems often complicate service ownership because of their ever-growing web of microservice dependencies. This complication challenges engineers’ ability to shepherd their software through every stage of the development life cycle, as well as teams’ ability to train new engineers on the application’s architecture. With increased complexity, clarity is key for quick, effective troubleshooting and delivering value to end users.

Analyze causal relationships and latencies across your distributed systems with Log Transaction Queries

Modern, high-scale applications can generate hundreds of millions of logs per day. Each log provides point-in-time insights into the state of the services and systems that emitted it. But logs are not created in isolation. Each log event represents a small, sequential step in a larger story, such as a user request, database restart process, or CI/CD pipeline.

Troubleshoot faulty frontend deployments with Deployment Tracking in RUM

Many developers and product teams are iterating faster and deploying more frequently to meet user expectations for responsive and optimized apps. These constant deployments—which can number in the dozens or even hundreds per day for larger organizations—are essential for keeping your customer base engaged and delighted. However, they also make it harder to pinpoint the exact deployment that led to a rise in errors, a new error, or a performance regression in your app.

Troubleshoot blocking queries with Datadog Database Monitoring

Blocked queries are one of the key issues faced by database analysts, engineers, and anyone managing database performance at scale. Blocking can be caused by inefficient query or database design as well as resource saturation, and can lead to increased latency, errors, and user frustration. Pinpointing root blockers—the underlying problematic queries that set off cascading locks on database resources—is key to troubleshooting and remediating database performance issues.

How Delivery Hero uses Kubecost and Datadog to manage Kubernetes costs in the cloud

As the world’s leading local delivery platform, Delivery Hero brings groceries and household goods to customers in more than 70 countries. Their technology stack comprises over 200 services across 20 Kubernetes clusters running on Amazon EKS. This cloud-based, containerized infrastructure enabled them to scale their operation to support increasing demand as the volume of orders placed on their platform doubled during the pandemic.

Optimize Kubernetes workload resourcing with StormForge and Datadog

StormForge Optimize Live is a machine learning-powered performance and resource optimization solution for Kubernetes workloads. Optimize Live ingests and analyzes production observability data and recommends specific actions to optimize CPU and memory utilization. You can take these actions manually or set them to occur automatically, making it easier to maintain a high level of application performance while minimizing cloud costs.

Autonomously optimize AWS Lambda deployments with Sedai and Datadog

In dynamic production environments, unpredictable traffic loads and frequent code changes can make it difficult for organizations to consistently optimize their cloud infrastructure, resulting in application performance issues, latency, and wasted cloud spend. Teams that manage large-scale cloud infrastructure deployments are often forced to tune their workloads’ configurations using a complicated mesh of script jobs—or worse, manual remediation by on-call engineers prompted by alerts.