Operations | Monitoring | ITSM | DevOps | Cloud

Datadog

Key metrics for CoreDNS monitoring

CoreDNS is an open source DNS server that can resolve requests for internet domain names and provide service discovery within a Kubernetes cluster. CoreDNS is the default DNS provider in Kubernetes as of v1.13. Though it can be used independently of Kubernetes, this series will focus on its role in providing Kubernetes service discovery, which simplifies cluster networking by enabling clients to access services using DNS names rather than IP addresses.

SRE in Transition: From Startup to Enterprise

"Startups are defined by “ship or die”. As a result, SRE teams at a startup should be focused on enabling product engineers to ship features as quickly as possible. As your startup transitions from “we’ll run out of money in the next 18 months” to “we have more than 1000 engineers”, how should the SRE organization evolve and provide the best value through that transition (including booting one up if you don’t have one)? I will discuss specific ways the organization needs to evolve to meet this challenge, how the SRE org can advocate for and support this change (both in direct actions and in “influence”), and how the overhang of startup technical and cultural debt can make this shift more challenging (but also more necessary).

From On-call to Non-call: Resolving Incidents Before They Even Happen

Artificial intelligence has captured the attention of the world, with tools like ChatGPT and large language models (LLMs) driving the conversation. But you don’t need to wait for the future or new features powered by LLMs to start working smarter—the tech industry has been investing in intelligent, automated tools for years and they’re ready for production now. In this talk, you’ll learn how the engineering teams at Toyota Connected use tools like Datadog Watchdog, Anomaly Detection, and Workflows to make our lives easier and keep our platform stable.

From Solution to Startup

Before Datadog was a widely adopted SaaS platform, it was a tool developed to solve our founders’ own monitoring needs. As technology-oriented people, we often build solutions for our own problems, then discover those problems are widespread. But how do you know when your solution should be something more? In this panel session, we’ll talk with tech startup founders to hear their stories and advice for turning tools into businesses.

Send your logs to multiple destinations with Datadog's managed Log Pipelines and Observability Pipelines

As your infrastructure and applications scale, so does the volume of your observability data. Managing a growing suite of tooling while balancing the need to mitigate costs, avoid vendor lock-in, and maintain data quality across an organization is becoming increasingly complex. With a variety of installed agents, log forwarders, and storage tools, the mechanisms you use to collect, transform, and route data should be able to evolve and adjust to your growth and meet the unique needs of your team.

Integration roundup: Monitoring your AI stack

Integrating AI, including large language models (LLMs), into your applications enables you to build powerful tools for data analysis, intelligent search, and text and image generation. There are a number of tools you can use to leverage AI and scale it according to your business needs, with specialized technologies such as vector databases, development platforms, and discrete GPUs being necessary to run many models. As a result, optimizing your system for AI often leads to upgrading your entire stack.

Enhance code reliability with Datadog Quality Gates

Maintaining the quality of your code becomes increasingly difficult as your organization grows. Engineering teams need to release code quickly while still finding a way to enforce best practices, catch security vulnerabilities, and prevent flaky tests. To address this challenge, Datadog is pleased to introduce Quality Gates, a feature that automatically halts code merges when they fail to satisfy your configured quality checks.

Easily test and monitor your mobile applications with Datadog Mobile Application Testing

Effective mobile application testing that meets all the requirements of modern quality assurance can be challenging. Not only do teams need to create tests that cover a range of different device types, operating system versions, and user interactions—including swipes, gestures, touches, and more—they also have to maintain the infrastructure and device fleets necessary to run these tests.

Store and analyze high-volume logs efficiently with Flex Logs

The volume of logs that organizations collect from all over their systems is growing exponentially. Sources range from distributed infrastructure to data pipelines and APIs, and different types of logs demand different treatment. As a result, logs have become increasingly difficult to manage. Organizations must reconcile conflicting needs for long-term retention, rapid access, and cost-effective storage.