Operations | Monitoring | ITSM | DevOps | Cloud

Understand user experience through network performance with Datadog Synthetic Monitoring

When an application slows down or fails, pinpointing the cause isn’t always simple. Is it a backend regression, a misbehaving API, or a bottleneck somewhere deep in the network? Without full visibility, teams waste precious time troubleshooting across disconnected tools and layers. Datadog Synthetic Monitoring now supports Network Path to help you proactively identify whether user-facing issues stem from your code or from the underlying network.

Accelerate your Azure integration setup with guided onboarding

Getting started with monitoring for Microsoft Azure environments can be a lengthy and manual process. Many tools require users to create app registrations, assign permissions, and enable log forwarding or telemetry data collection across multiple portals and scripts. These fragmented steps slow down onboarding and introduce opportunities for misconfiguration, making it harder for teams to quickly achieve full visibility.

The Outage Anxiety Test: Can You Answer These 3 Questions In Under 10 Minutes?

On Oct. 20, the Internet woke up and seemingly chose violence. For more than 12 hours, Amazon Web Services (AWS) went down. From banking platforms to hospital communications to mobile ordering apps, digital services came to a screeching halt. The cause? Two programs are trying to write a DNS entry simultaneously, failing, and leaving the entry blank. Thus began the incredibly costly failure cascade.

AI And Sustainability: Measuring The Impact Of The Generative AI Boom

Before 2022, Alex Hanna worked on Google’s Ethical AI team. Today, she’s the director of research at the Distributed AI Research Institute, a transition sparked by Google’s handling of a paper exposing AI’s growing environmental footprint. So, how bad is it, really? That depends on who you ask. Take Jesse Dodge, a senior research analyst at the Allen Institute for AI. Jesse told NPR that a single ChatGPT query can use as much electricity as keeping a light bulb on for 20 minutes.

Stop the guesswork: Troubleshoot with confidence with process monitoring

IT infrastructure is vast, complex, and interdependent. At any point in time, businesses rely on thousands of servers running thousands of processes. Detecting server downtime is fairly easy—but true observability is when you know precisely which processes are working as intended and which are silently contributing to performance degradation. A failed database worker or a memory-leaking background service can silently drain resources until your most critical apps grind to a halt.

Synthetic Monitoring for GraphQL Endpoints: Beyond the Query

GraphQL isn’t just another API protocol—it’s a new layer of abstraction. It collapsed dozens of REST endpoints into one flexible interface where clients decide what data to fetch and how deep to go. That freedom is a gift for front-end teams and a headache for anyone tasked with reliability. Traditional monitoring doesn’t work here. A REST endpoint can be pinged for uptime.

Building dbRosetta Using AI: Part 1 of Many

Like many of you, over the last couple of years, I’ve been using AI, or, well, let’s just name it appropriately, Large Language Models (LLM), as a part of my job. I’ve also used it in my hobby. With it, I’ve generated snippets of code, tested data conversions, even built a small database for a presentation. However, to date, I haven’t tried doing everything through the LLM. Now, I’m going to.

Intent-Driven Assertions are Redefining How We Test Software

Traditional UI testing struggles to keep up with rapid design and workflow changes, often focusing on brittle selectors rather than user outcomes. Harness AI Test Automation introduces intent-driven, natural language assertions that understand what teams want to verify, not just how tests are written.

AI Agent for Proactive Problem Management: A Shift Toward a Ticketless Future

As organizations rely on increasingly complex IT infrastructures, incident management often turns into a constant cycle of alerts, escalations, and fixes. While reactive responses may keep operations running, they rarely address the deeper systemic issues that slowly erode performance. Recurring incidents, silent failures, and hidden patterns are usually symptoms of unresolved root causes that traditional approaches struggle to uncover.