December 2024

Track AI Costs with Datadog Cloud Cost Management for OpenAI! Learn More on TMiDD! #AI #CloudCost

Dec 23, 2024 By Datadog In Datadog

On This Month in Datadog, we’re spotlighting Datadog Cloud Cost Management for OpenAI, which enables you to break down costs by project and organization, as well as by individual model and their token consumption.

View Video

Datadog

Read more about Track AI Costs with Datadog Cloud Cost Management for OpenAI! Learn More on TMiDD! #AI #CloudCost

How to support a growing Kubernetes cluster with a small etcd

Dec 20, 2024 By David M. Lentz In Datadog

Etcd plays a critical role in your Kubernetes setup: it stores the ever-changing state of your cluster and its objects, and the API server uses this data to manage cluster resources. As your applications thrive and your Kubernetes clusters see more traffic, etcd handles an increasing amount of data. But etcd’s storage space is limited: the recommended maximum is 8 GiB, and a large and dynamic cluster can easily generate enough data to reach that limit.

Read Post

Datadog

Read more about How to support a growing Kubernetes cluster with a small etcd

Monitor your Pinecone vector databases with Datadog

Dec 20, 2024 By Candace Shamieh In Datadog

Pinecone is a vector database that helps users build and deploy generative AI applications at scale. Whether using its serverless architecture or a hosted model, Pinecone allows users to store, search, and retrieve the most meaningful information from their company data with each query, sending only the necessary context to Large Language Models (LLMs). By providing the ability to search and retrieve contextual data, Pinecone enables you to reduce LLM hallucinations and enhance data security.

Read Post

Datadog

Read more about Monitor your Pinecone vector databases with Datadog

Best practices for monitoring event-driven architectures

Dec 19, 2024 By Candace Shamieh In Datadog

Microservices architectures empower individual teams to choose their own programming language, tools, and technologies, resulting in more independence and the ability to develop and release features faster. While there are various types of integration patterns that can facilitate microservice communication, many organizations choose to adopt event-driven architectures (EDAs) because of their scalability, agility, and resilience.

Read Post

Datadog

Read more about Best practices for monitoring event-driven architectures

re:Invent Recap Livestream: 2024

Dec 18, 2024 By Datadog In Datadog

Did you miss this year’s re:Invent? Or maybe you were onsite but too busy deep diving on certifications, new products, and networking. Don’t worry—the Datadog team is streaming right to your home on December 17 to recap all of the highlights from the event. Join Andrew Krug from Datadog’s Technical Community along with a host of AWS guests to hear about exciting announcements from AWS re:Invent 2024, Datadog’s latest product launches, and a rundown of the best on-demand sessions that you’ll want to make sure to tune into.

View Video

Datadog

Read more about re:Invent Recap Livestream: 2024

This Month in Datadog: Monitor OpenAI costs, Kubernetes Active Remediation, IaC Security, and more

Dec 18, 2024 By Datadog In Datadog

Datadog is constantly elevating the approach to cloud monitoring and security. This Month in Datadog updates you on our newest product features, announcements, resources, and events. To learn more about Datadog and start a free 14-day trial, visit Cloud Monitoring as a Service | Datadog. This month, we put the Spotlight on Datadog Cloud Cost Management for OpenAI.

View Video

Datadog

Read more about This Month in Datadog: Monitor OpenAI costs, Kubernetes Active Remediation, IaC Security, and more

This Month in Datadog - December 2024

Dec 18, 2024 By Datadog In Datadog

On the December episode of This Month in Datadog, Jeremy Garcia (VP of Technical Community and Open Source) covers Kubernetes Active Remediation, Datadog IaC Security, and a trio of new features for monitoring AWS resources. Later in the episode, Natasha Goel (Product Manager) spotlights Datadog Cloud Cost Management for OpenAI. Also featured is a short recap of Datadog at KubeCon North America and AWS re:Invent 2024.

Read Post

Datadog

Read more about This Month in Datadog - December 2024

Datadog Database Monitoring: Improve Database and Application Performance

Dec 12, 2024 By Datadog In Datadog

Datadog Database Monitoring unifies query, application, and database telemetry in one platform, enabling teams to easily identify bottlenecks, understand database load, optimize query performance, uncover costly queries, and correlate database and application telemetry.

View Video

Datadog

Read more about Datadog Database Monitoring: Improve Database and Application Performance

Increase visibility into network incidents using moovingon.ai and Datadog

Dec 11, 2024 By Lauren Lowe In Datadog

moovingon.ai is a platform that consolidates alerts, incidents, audits, runbooks, and other resources for 24/7 network operations center (NOC) engineering teams. These teams often have to work collaboratively to maintain uptime for mission-critical cloud infrastructure and applications and need specialized resources to facilitate investigations in the event of an issue.

Read Post

Datadog

Read more about Increase visibility into network incidents using moovingon.ai and Datadog

Highlights from AWS re:Invent 2024

Dec 9, 2024 By Andrew Krug In Datadog

Whether or not you made the journey to this year’s AWS re:Invent, there’s always a variety of great announcements lost amid an action-packed week of keynotes, breakouts, expo hall demos, and networking sessions. No need to worry—we’re always happy to be a big part of the re:Invent experience and share our observations with you. You can also join us on December 17, 2024, for a re:Invent re:Cap livestream by registering here.

Read Post

Datadog

Read more about Highlights from AWS re:Invent 2024

Automatically group events and reduce noise with AI-powered Intelligent Correlation

Dec 5, 2024 By Samantha Scaglione In Datadog

When you have a complex IT environment with many disparate tools, data sources, and teams, alert noise becomes overwhelming. This can delay incident response and cause missed alerts, ultimately leading to critical incidents and outages. Datadog Event Management’s Event Correlation groups and deduplicates events and alerts, reducing noise and helping response teams act on alerts faster.

Read Post

Datadog

Read more about Automatically group events and reduce noise with AI-powered Intelligent Correlation

Troubleshoot infrastructure changes faster with Recent Changes in the Resource Catalog

Dec 5, 2024 By Sriram Raman In Datadog

Organizations often struggle to maintain visibility and control over their distributed cloud infrastructure, where changes in a single resource can have cascading effects throughout the system and potentially cause disruptions. In these environments, infrastructure changes that lead to incidents are often hard to troubleshoot—especially when teams are using disparate tools with siloed data—leading to longer resolution times, more downtime, and negative business outcomes.

Read Post

Datadog

Read more about Troubleshoot infrastructure changes faster with Recent Changes in the Resource Catalog

Optimize and troubleshoot cloud storage at scale with Storage Monitoring

Dec 4, 2024 By Mahashree Rajendran In Datadog

Organizations today rely on cloud object storage to power diverse workloads, from data analytics and machine learning pipelines to content delivery platforms. But as data volumes explode and storage patterns become more complex, teams often struggle to understand and proactively optimize their storage utilization. When issues arise—such as unexpected costs or performance bottlenecks—these teams frequently lack the visibility needed to quickly identify and resolve root causes.

Read Post

Datadog

Read more about Optimize and troubleshoot cloud storage at scale with Storage Monitoring

Gain comprehensive visibility into your ECS applications with the ECS Explorer

Dec 3, 2024 By Danny Driscoll In Datadog

Amazon Elastic Container Service (ECS) is a container orchestration service that enables you to efficiently deploy new applications or modernize existing ones by migrating them to a containerized environment. Building on ECS gives you the flexibility, scalability, and security that containers offer, but also presents challenges in monitoring and troubleshooting your applications and infrastructure.

Read Post

Datadog

Read more about Gain comprehensive visibility into your ECS applications with the ECS Explorer

Introducing Datadog's Next-Generation Rust-based Lambda Extension

Dec 3, 2024 By Jordan Obey In Datadog

In 2021, we announced the release of the Datadog Lambda extension, a simplified, cost-effective way for customers to collect monitoring data from their AWS Lambda functions. This extension was a specialized build of our main Datadog Agent designed to monitor Lambda executions.

Read Post

Datadog

Read more about Introducing Datadog's Next-Generation Rust-based Lambda Extension

State of Cloud Costs

Dec 3, 2024 By Datadog In Datadog

Cloud spending continues to grow, but managing costs effectively remains a challenge for many organizations. In this video, Datadog Senior Product Manager Kayla Taylor dives into our recent State of Cloud Costs report—which analyzed AWS cloud cost data from hundreds of organizations—to understand the key factors driving cloud expenses. We explore the impact of adopting emerging compute technologies like Arm-based processors, GPUs, and AI capabilities, how usage patterns and previous-generation technologies affect cloud costs, and the role of AWS discount programs in cost management.

View Video

Datadog

Read more about State of Cloud Costs

Monitor AWS Trainium and AWS Inferentia with Datadog for holistic visibility into ML infrastructure

Dec 3, 2024 By Anjali Thatte In Datadog

AWS Inferentia and AWS Trainium are purpose-built AI chips that—with the AWS Neuron SDK—are used to build and deploy generative AI models. As models increasingly require a larger number of accelerated compute instances, observability plays a critical role in ML operations, empowering users to improve performance, diagnose and fix failures, and optimize resource utilization.

Read Post

Datadog

Read more about Monitor AWS Trainium and AWS Inferentia with Datadog for holistic visibility into ML infrastructure

How Datadog migrated its Kubernetes fleet on AWS to Arm at scale

Dec 2, 2024 By Matthieu Jaillais In Datadog

Over the past few years, Arm has surged to the forefront of computing. For decades, Arm processors were mainly associated with a handful of specific use cases, such as smartphones, IoT devices, and the Raspberry Pi. But the introduction of AWS Graviton2 in 2019 and the adoption of Arm-based hardware platforms by Apple and others helped bring about a dramatic shift, and Arm is now the most widely used processor architecture in the world.

Read Post

Datadog

Read more about How Datadog migrated its Kubernetes fleet on AWS to Arm at scale

Achieve total app visibility in minutes with Single Step Instrumentation

Dec 2, 2024 By Evan Pandya In Datadog

Datadog APM and distributed tracing provide teams with an end-to-end view of requests across services, uncovering dependencies and performance bottlenecks to enable real-time troubleshooting and optimization. However, traditional manual instrumentation, while customizable, is often time consuming, error prone, and resource intensive, requiring developers to configure each service individually and closely collaborate with SRE teams.

Read Post

Datadog

Read more about Achieve total app visibility in minutes with Single Step Instrumentation

Monitor your OpenAI LLM spend with cost insights from Datadog

Dec 2, 2024 By Thomas Sobolik In Datadog

Managing LLM provider costs has become a chief concern for organizations building and deploying custom applications that consume services like OpenAI. These applications often rely on multiple backend LLM calls to handle a single initial prompt, leading to rapid token consumption—and consequently, rising costs. But shortening prompts or chunking documents to reduce token consumption can be difficult and introduce performance trade-offs, including an increased risk of hallucinations.

Read Post

Datadog

Read more about Monitor your OpenAI LLM spend with cost insights from Datadog

Secure your cloud environment from end to end with Datadog Infrastructure-as-Code Security

Dec 2, 2024 By Cliff Kim In Datadog

Infrastructure-as-code (IaC) tools like Terraform and CloudFormation allow teams to define, manage, and provision their cloud infrastructure using code, as opposed to clicking through consoles or executing commands via a CLI. IaC adoption is now widespread and helps teams increase productivity and efficiency, but it also introduces new surface area for mistakes, defects, and other risks.

Read Post

Datadog

Read more about Secure your cloud environment from end to end with Datadog Infrastructure-as-Code Security

Centrally manage Agent upgrades and configurations with Datadog Fleet Automation

Dec 2, 2024 By Vignesh Palaniappan In Datadog

Teams can gain deep visibility into their applications and infrastructure by installing Datadog’s client-side agent software—the Datadog Agent—throughout their environment. And to help ensure the Agent is deployed correctly and consistently, Datadog’s Fleet Automation feature already helps teams centrally view Agent installations and configurations. But teams also need an easier way to manage the deployment and configuration of the Agent at scale.

Read Post

Datadog

Read more about Centrally manage Agent upgrades and configurations with Datadog Fleet Automation

Operations | Monitoring | ITSM | DevOps | Cloud

December 2024

Track AI Costs with Datadog Cloud Cost Management for OpenAI! Learn More on TMiDD! #AI #CloudCost

How to support a growing Kubernetes cluster with a small etcd

Monitor your Pinecone vector databases with Datadog

Best practices for monitoring event-driven architectures

re:Invent Recap Livestream: 2024

This Month in Datadog: Monitor OpenAI costs, Kubernetes Active Remediation, IaC Security, and more

This Month in Datadog - December 2024

Datadog Database Monitoring: Improve Database and Application Performance

Increase visibility into network incidents using moovingon.ai and Datadog

Highlights from AWS re:Invent 2024

Automatically group events and reduce noise with AI-powered Intelligent Correlation

Troubleshoot infrastructure changes faster with Recent Changes in the Resource Catalog

Optimize and troubleshoot cloud storage at scale with Storage Monitoring

Gain comprehensive visibility into your ECS applications with the ECS Explorer

Introducing Datadog's Next-Generation Rust-based Lambda Extension

State of Cloud Costs

Monitor AWS Trainium and AWS Inferentia with Datadog for holistic visibility into ML infrastructure

How Datadog migrated its Kubernetes fleet on AWS to Arm at scale

Achieve total app visibility in minutes with Single Step Instrumentation

Monitor your OpenAI LLM spend with cost insights from Datadog

Secure your cloud environment from end to end with Datadog Infrastructure-as-Code Security

Centrally manage Agent upgrades and configurations with Datadog Fleet Automation

Monthly Archive

Follow Us