Operations | Monitoring | ITSM | DevOps | Cloud

November 2024

Build Datadog workflows and apps in minutes with our AI assistant

Datadog is a central hub of information—enabling you to see logs, traces, and metrics from across your stack and providing a centralized source of notifications about potential issues. However, when Datadog notifies you of an issue, you often need to log in to other applications to fully assess and resolve it, which slows down mitigation.

Stream logs in the OCSF format to your preferred security vendors or data lakes with Observability Pipelines

Today, CISOs and security teams face a rapidly growing volume of logs from a variety of sources, all arriving in different formats. They write and maintain detection rules, build pipelines, and investigate threats across multiple environments and applications. Efficiently maintaining their security posture across multiple products and data formats has become increasingly challenging.

Optimize LLM application performance with Datadog's vLLM integration

vLLM is a high-performance serving framework for large language models (LLMs). It optimizes token generation and resource management to deliver low-latency, scalable performance for AI-driven applications such as chatbots, virtual assistants, and recommendation systems. By efficiently managing concurrent requests and overlapping tasks, vLLM enables organizations to deploy LLMs in demanding environments with speed and efficiency.

Get deeper visibility into your AWS serverless apps with enhanced distributed tracing

Serverless or event-driven applications can comprise many different distributed components, including serverless compute services such as AWS Lambda and AWS Fargate for Amazon ECS, as well as managed data streams, data stores, workflow orchestration tools, queues, and more. Having full end-to-end visibility into requests as they propagate across all of these parts of your application is crucial to monitoring performance, locating affected up- or downstream services, and troubleshooting issues.

Best practices for monitoring progressive web applications

Progressive web applications (PWAs) are a modern frontend architecture designed to provide a similar user experience to that of a native iOS, Android, or other platform-specific app. PWAs are built using common web platform technologies—such as, HTML, CSS, and JavaScript—and are intended not only to run in a browser and be accessed from the web, but also to be installed on users’ devices and accessed offline.

Identify deprecated Lambda functions with Datadog

AWS Lambda supports nearly any programming language by enabling developers to run serverless functions with either supported or custom runtimes. Once a runtime is deprecated, however, AWS will set dates for when you can no longer create or update functions using that runtime. You will then need to decide what course of action to take to ensure your Lambda functions continue running as expected.

Detect anomalies before they become incidents with Datadog AIOps

As your IT environment scales, a proactive approach to monitoring becomes increasingly critical. If your infrastructure environment contains multiple service dependencies, disparate systems, or a busy CI/CD application delivery pipeline, overlooked anomalies can result in a domino effect that leads to unplanned downtime and an adverse impact on users.

Datadog on Cloud Workload Identities

Datadog operates dozens of Kubernetes clusters, tens of thousands of hosts, and millions of containers across a multi-cloud environment, spanning AWS, Azure, and Google Cloud. With over 2,000 engineers, we needed to ensure that every developer and application could securely and efficiently access resources across these various cloud providers.

Detect and troubleshoot Windows Blue Screen errors with Datadog

Windows Blue Screen errors—also known as bug checks, STOP codes, kernel errors, or the Blue Screen of Death (BSOD)—are triggered when the operating system detects a critical issue that compromises system stability. To prevent further damage or data corruption, the OS determines that the safest course of action is to shut down immediately. The system then restarts and displays the well-known BSOD.

Integrate usage data into your product analytics strategy

Web applications emit a wealth of metadata and user interaction information that’s critical to understanding user behavior. However, parsing this data to find what is most relevant to your product analytics project can be challenging—what one product analyst might find useful, another might consider unnecessary noise.

Kubernetes autoscaling guide: determine which solution is right for your use case

Kubernetes offers the ability to scale infrastructure to accommodate fluctuating demand, enabling organizations to maintain availability and high performance during surges in traffic and reduce costs during lulls. But scaling comes with tradeoffs and must be done carefully to ensure teams are not over-provisioning their workloads or clusters. For example, organizations often struggle with overprovisioning in Kubernetes and wind up paying for resources that go unused.

Get complete Kubernetes observability by monitoring your CRDs with Datadog Container Monitoring

Custom resources are critical components in Kubernetes production environments. They enable users to tailor Kubernetes resources to their specific applications or infrastructure needs, automate processes through operators, simplify the management of complex applications, and integrate with non-native applications such as Kafka and Elasticsearch.

A guide on scaling out your Kubernetes pods with the Watermark Pod Autoscaler

While overprovisioning Kubernetes workloads can provide stability during the launch of new products, it’s often only sustainable because large companies have substantial budgets and favorable deals with cloud providers. As highlighted in Datadog’s State of Cloud Costs report, cloud spending continues to grow, but a significant portion of that cost is often due to inefficiencies like overprovisioning.

Monitor Azure AI Search with Datadog

Azure AI Search is Microsoft Azure’s managed search service. In addition to tackling traditional search use cases, Azure AI Search also includes AI-powered features to make it a fully capable document catalog, search engine, and vector database. AI Search is highly interoperable—it can use models created in Azure OpenAI Service, Azure AI Studio, or Azure ML.

Use Datadog App Builder to peak, purge, or redrive AWS SQS.

This video aims to showcase how developers can self-serve from an application to simplify the management of their AWS cloud resources. Rather than switching between tools or reaching out to another team for help, developers can take action directly from their observability tool, enabling faster resolution of application issues. We will demonstrate how to build a simple app that allows them to minimize disruptions by quickly taking action on their SQS queues in AWS, using insights provided by Datadog.

Monitor the cost of your public sector applications with Datadog Cloud Cost Management

As federal, state, and local government agencies work to modernize their digital infrastructure and applications, managing costs effectively remains a constant challenge. Federal directives like Cloud Smart indicate the need for public sector IT organizations to track and optimize their cloud spends. However, as an organization’s IT environment grows in complexity, it becomes difficult to correlate cost data and extract useful insights.

Troubleshooting RAG-based LLM applications

LLMs like GPT-4, Claude, and Llama are behind popular tools like intelligent assistants, customer service chatbots, natural language query interfaces, and many more. These solutions are incredibly useful, but they are often constrained by the information they were trained on. This often means that LLM applications are limited to providing generic responses that lack proprietary or context-specific knowledge, reducing their usefulness in specialized settings.

This Month in Datadog - October 2024

On the October episode of This Month in Datadog, Jeremy Garcia (VP of Technical Community and Open Source) covers unified Error Tracking, Security Operational Metrics, and a new Datadog Serverless feature for retrying or redriving failed AWS Step Functions executions directly from Datadog. Later in the episode, Shri Subramanian (Group Product Manager) spotlights Datadog LLM Observability’s native integration with Google Gemini. Also featured are our blog posts Operator vs.

Create ServiceNow tickets from Datadog alerts

ServiceNow is a popular IT service management platform for recording, tracking, and managing a company’s enterprise-level IT processes in a single location. In addition to helping you manage your ServiceNow CMDB, Datadog also integrates with ServiceNow IT Operations Management (ITOM) and IT Service Management (ITSM), enabling you to automatically create and manage ServiceNow incidents and events from the Datadog platform.

How we use Scorecards to define and communicate best practices at scale

In modern, distributed applications, shared standards for performance and reliability are key to maintaining a healthy production environment and providing a dependable user experience. But establishing and maintaining these standards at scale can be a challenge: when you have hundreds or thousands of services overseen by a wide range of teams, there are no one-size-fits-all solutions. How do you determine effective best practices in such a complex environment?

Datadog on Building Reliable Distributed Applications Using Temporal

Temporal is an open source platform to build resilient and reliable distributed systems. Datadog started using Temporal in 2020 as the foundation for our internal software delivery platform. Since then, its usage has been widely adopted as a platform that any engineering team can use to build their systems. In this Datadog on episode, Ara Pulido chats with Loïc Minaudier, Senior Software Engineer in the Atlas team, responsible for providing a developer platform on top of Temporal, and Allen George, Engineering Manager in the Datadog Workflows team.

Introducing the Datadog Architecture Center

To prevent visibility gaps in your cloud environment, you need to efficiently deploy observability solutions that integrate easily with key technologies in your stack and scale reliably with new applications and migrated workloads. But observability deployments can be complex, often requiring deep and specific knowledge that may not be available within your teams.

Track and troubleshoot MongoDB performance with Datadog Database Monitoring

Many modern applications rely on MongoDB and MongoDB Atlas to manage growing data volumes and to provide flexible schema and data structures. As organizations adopt these and other NoSQL databases, effective monitoring and optimization become critical, especially in distributed environments.

Ensure high service availability with Datadog Service Management

Adopting a cloud-based, distributed architecture may help your organization scale quickly, but it can also add complexity. Correlating telemetry, security signals, and alerts across services often proves difficult, resulting in slower issue remediation. Additionally, when something goes wrong, figuring out who to contact—for example, the on-call responder or the service owner— may become needlessly time-consuming.