Latest Posts

How to support a growing Kubernetes cluster with a small etcd

Dec 20, 2024 By David M. Lentz In Datadog

Etcd plays a critical role in your Kubernetes setup: it stores the ever-changing state of your cluster and its objects, and the API server uses this data to manage cluster resources. As your applications thrive and your Kubernetes clusters see more traffic, etcd handles an increasing amount of data. But etcd’s storage space is limited: the recommended maximum is 8 GiB, and a large and dynamic cluster can easily generate enough data to reach that limit.

Read Post

Datadog

Read more about How to support a growing Kubernetes cluster with a small etcd

Monitor your Pinecone vector databases with Datadog

Dec 20, 2024 By Candace Shamieh In Datadog

Pinecone is a vector database that helps users build and deploy generative AI applications at scale. Whether using its serverless architecture or a hosted model, Pinecone allows users to store, search, and retrieve the most meaningful information from their company data with each query, sending only the necessary context to Large Language Models (LLMs). By providing the ability to search and retrieve contextual data, Pinecone enables you to reduce LLM hallucinations and enhance data security.

Read Post

Datadog

Read more about Monitor your Pinecone vector databases with Datadog

Best practices for monitoring event-driven architectures

Dec 19, 2024 By Candace Shamieh In Datadog

Microservices architectures empower individual teams to choose their own programming language, tools, and technologies, resulting in more independence and the ability to develop and release features faster. While there are various types of integration patterns that can facilitate microservice communication, many organizations choose to adopt event-driven architectures (EDAs) because of their scalability, agility, and resilience.

Read Post

Datadog

Read more about Best practices for monitoring event-driven architectures

This Month in Datadog - December 2024

Dec 18, 2024 By Datadog In Datadog

On the December episode of This Month in Datadog, Jeremy Garcia (VP of Technical Community and Open Source) covers Kubernetes Active Remediation, Datadog IaC Security, and a trio of new features for monitoring AWS resources. Later in the episode, Natasha Goel (Product Manager) spotlights Datadog Cloud Cost Management for OpenAI. Also featured is a short recap of Datadog at KubeCon North America and AWS re:Invent 2024.

Read Post

Datadog

Read more about This Month in Datadog - December 2024

Increase visibility into network incidents using moovingon.ai and Datadog

Dec 11, 2024 By Lauren Lowe In Datadog

moovingon.ai is a platform that consolidates alerts, incidents, audits, runbooks, and other resources for 24/7 network operations center (NOC) engineering teams. These teams often have to work collaboratively to maintain uptime for mission-critical cloud infrastructure and applications and need specialized resources to facilitate investigations in the event of an issue.

Read Post

Datadog

Read more about Increase visibility into network incidents using moovingon.ai and Datadog

Highlights from AWS re:Invent 2024

Dec 9, 2024 By Andrew Krug In Datadog

Whether or not you made the journey to this year’s AWS re:Invent, there’s always a variety of great announcements lost amid an action-packed week of keynotes, breakouts, expo hall demos, and networking sessions. No need to worry—we’re always happy to be a big part of the re:Invent experience and share our observations with you. You can also join us on December 17, 2024, for a re:Invent re:Cap livestream by registering here.

Read Post

Datadog

Read more about Highlights from AWS re:Invent 2024

Automatically group events and reduce noise with AI-powered Intelligent Correlation

Dec 5, 2024 By Samantha Scaglione In Datadog

When you have a complex IT environment with many disparate tools, data sources, and teams, alert noise becomes overwhelming. This can delay incident response and cause missed alerts, ultimately leading to critical incidents and outages. Datadog Event Management’s Event Correlation groups and deduplicates events and alerts, reducing noise and helping response teams act on alerts faster.

Read Post

Datadog

Read more about Automatically group events and reduce noise with AI-powered Intelligent Correlation

Troubleshoot infrastructure changes faster with Recent Changes in the Resource Catalog

Dec 5, 2024 By Sriram Raman In Datadog

Organizations often struggle to maintain visibility and control over their distributed cloud infrastructure, where changes in a single resource can have cascading effects throughout the system and potentially cause disruptions. In these environments, infrastructure changes that lead to incidents are often hard to troubleshoot—especially when teams are using disparate tools with siloed data—leading to longer resolution times, more downtime, and negative business outcomes.

Read Post

Datadog

Read more about Troubleshoot infrastructure changes faster with Recent Changes in the Resource Catalog

Optimize and troubleshoot cloud storage at scale with Storage Monitoring

Dec 4, 2024 By Mahashree Rajendran In Datadog

Organizations today rely on cloud object storage to power diverse workloads, from data analytics and machine learning pipelines to content delivery platforms. But as data volumes explode and storage patterns become more complex, teams often struggle to understand and proactively optimize their storage utilization. When issues arise—such as unexpected costs or performance bottlenecks—these teams frequently lack the visibility needed to quickly identify and resolve root causes.

Read Post

Datadog

Read more about Optimize and troubleshoot cloud storage at scale with Storage Monitoring

Gain comprehensive visibility into your ECS applications with the ECS Explorer

Dec 3, 2024 By Danny Driscoll In Datadog

Amazon Elastic Container Service (ECS) is a container orchestration service that enables you to efficiently deploy new applications or modernize existing ones by migrating them to a containerized environment. Building on ECS gives you the flexibility, scalability, and security that containers offer, but also presents challenges in monitoring and troubleshooting your applications and infrastructure.

Read Post

Datadog

Read more about Gain comprehensive visibility into your ECS applications with the ECS Explorer

Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

How to support a growing Kubernetes cluster with a small etcd

Monitor your Pinecone vector databases with Datadog

Best practices for monitoring event-driven architectures

This Month in Datadog - December 2024

Increase visibility into network incidents using moovingon.ai and Datadog

Highlights from AWS re:Invent 2024

Automatically group events and reduce noise with AI-powered Intelligent Correlation

Troubleshoot infrastructure changes faster with Recent Changes in the Resource Catalog

Optimize and troubleshoot cloud storage at scale with Storage Monitoring

Gain comprehensive visibility into your ECS applications with the ECS Explorer

Monthly Archive

Follow Us