Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

Lessons learned from running a large gRPC mesh at Datadog

Datadog’s infrastructure comprises hundreds of distributed services, which are constantly discovering other services to network with, exchanging data, streaming events, triggering actions, coordinating distributed transactions involving multiple services, and more. Implementing a networking solution for such a large, complex application comes with its own set of challenges, including scalability, load balancing, fault tolerance, compatibility, and latency.

Control your log volumes with Datadog Observability Pipelines

Modern organizations face a challenge in handling the massive volumes of log data—often scaling to terabytes—that they generate across their environments every day. Teams rely on this data to help them identify, diagnose, and resolve issues more quickly, but how and where should they store logs to best suit this purpose? For many organizations, the immediate answer is to consolidate all logs remotely in higher-cost indexed storage to ready them for searching and analysis.

Aggregate, process, and route logs easily with Datadog Observability Pipelines

The volume of logs generated from modern environments can overwhelm teams, making it difficult to manage, process, and derive measurable value from them. As organizations seek to manage this influx of data with log management systems, SIEM providers, or storage solutions, they can inadvertently become locked into vendor ecosystems, face substantial network costs and processing fees, and run the risk of sensitive data leakage.

Dual ship logs with Datadog Observability Pipelines

Organizations often adjust their logging strategy to meet their changing observability needs for use cases such as security, auditing, log management, and long-term storage. This process involves trialing and eventually migrating to new solutions without disrupting existing workflows. However, configuring and maintaining multiple log pipelines can be complex. Enabling new solutions across your infrastructure and migrating everyone to a shared platform requires significant time and engineering effort.

A closer look at our navigation redesign

Helping our users gain end-to-end visibility into their systems is key to the Datadog platform— to achieve this, we offer over 20 products and more than 700 integrations. However, with an ever-expanding, increasingly diverse catalog, it’s more important than ever that users have clear paths for quickly finding what they need.

Recapping Datadog Summit London 2024

In the last week of March 2024, Datadog hosted its latest Datadog Summit in London to celebrate our community. As Jeremy Garcia, Datadog’s VP of Technical Community and Open Source, mentioned during his welcome remarks, London is the first city that has seen two Datadog Summits, with the first one in 2018. It was great to be able to see how our community there has grown over the past six years.

Stay up to date on the latest incidents with Bits AI

Since the release of ChatGPT, there’s been growing excitement about the potential of generative AI—a class of artificial intelligence trained on pre-existing datasets to generate text, images, videos, and other media—to transform global businesses. Last year, we released our own generative AI-powered DevOps copilot called Bits AI in private beta. Bits AI provides a conversational UI to explore observability data using natural language.

Monitor SQS with Data Streams Monitoring

Datadog Data Streams Monitoring (DSM) provides detailed visibility into your event-driven applications and streaming data pipelines, letting you easily track and improve performance. We’ve covered DSM for Kafka and RabbitMQ users previously on our blog. In this post, we’ll guide you through using DSM to monitor applications built with Amazon Simple Queue Service (SQS).

Empower engineers to take ownership of Google Cloud costs with Datadog

Google Cloud provides a wide range of services and tools to help engineering teams reduce the complexity of migrating and deploying applications in the cloud. As engineering teams work to improve the performance, reliability, and security of their applications, they also need to be conscious of cloud costs. But engineers often don’t have access to cost data, or they only see cost data in monthly reports.

Filter and correlate logs dynamically using Subqueries

Logs provide valuable information that can help you troubleshoot performance issues, track usage patterns, and conduct security audits. To derive actionable insights from log sources and facilitate thorough investigations, Datadog Log Management provides an easy-to-use query editor that enables you to group logs into patterns with a single click or perform reference table lookups on-the-fly for in-depth analysis.