Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

Create browser tests directly from Datadog RUM Session Replay

Testing is a key part of application development and helps you maintain a reliable experience for your users. But the process can be difficult to scale and is often siloed to a single team or individual that does not have broad knowledge of your application’s UI. This can lead to organizations investing in sizable test suites that do not accurately represent real user behavior.

Leverage user context to debug mobile performance issues with the Instabug Datadog Marketplace offering

As user expectations for mobile apps increase, effective bug remediation involves not only addressing critical incidents as they occur but also proactively handling smaller performance issues in order to ensure a smooth user experience (UX). Instabug helps you understand how users experience your app with crucial mobile performance metrics—such as launch metrics, loading times, and UI hangs—viewable alongside your bug reports.

Monitor Google Cloud Vertex AI with Datadog

Vertex AI is Google’s platform offering AI and machine learning computing as a service—enabling users to train and deploy machine learning (ML) models and AI applications in the cloud. In June 2023, Google added generative AI support to Vertex AI, so users can test, tune, and deploy Google’s large language models (LLMs) for use in their applications.

Visualize service ownership and application boundaries in the Service Map

The complexity of microservice architectures can make it hard to determine where an application’s dependencies begin and end and who manages which ones. This can pose a variety of challenges both in the course of day-to-day operations and during incidents. Lacking a clear picture of the ownership and interplay of your services can impede accountability and cause application development, incident investigations, and onboarding processes to become prolonged and haphazard.

Key questions to ask when setting SLOs

Many organizations rely on service level objectives (SLOs) to help them gauge the reliability of their products. By setting SLOs that define clear and measurable reliability targets, businesses can ensure they are delivering positive end-user experiences to their customers. Clearly defined SLOs also make it much easier for businesses to understand what tradeoffs they may have to make in order to deliver those specific experiences.

Tools for collecting metrics and logs from CoreDNS

In Part 1 of this series, we looked at key metrics you should monitor to understand the performance of your CoreDNS servers. In this post, we’ll show you how to collect and visualize these metrics. We’ll also explore how CoreDNS logging works and show you how to collect CoreDNS logs to get even deeper visibility into your Deployment.

Key metrics for CoreDNS monitoring

CoreDNS is an open source DNS server that can resolve requests for internet domain names and provide service discovery within a Kubernetes cluster. CoreDNS is the default DNS provider in Kubernetes as of v1.13. Though it can be used independently of Kubernetes, this series will focus on its role in providing Kubernetes service discovery, which simplifies cluster networking by enabling clients to access services using DNS names rather than IP addresses.

How to monitor CoreDNS with Datadog

In Part 1 of this series, we introduced you to the key metrics you should be monitoring to ensure that you get optimal performance from CoreDNS running in your Kubernetes clusters. In Part 2, we showed you some tools you can use to monitor CoreDNS. In this post, we’ll show you how you can use Datadog to monitor metrics, logs, and traces from CoreDNS alongside telemetry from the rest of your cluster, including the infrastructure it runs on.

Send your logs to multiple destinations with Datadog's managed Log Pipelines and Observability Pipelines

As your infrastructure and applications scale, so does the volume of your observability data. Managing a growing suite of tooling while balancing the need to mitigate costs, avoid vendor lock-in, and maintain data quality across an organization is becoming increasingly complex. With a variety of installed agents, log forwarders, and storage tools, the mechanisms you use to collect, transform, and route data should be able to evolve and adjust to your growth and meet the unique needs of your team.