Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Web Accessibility Monitoring: an Ops Team Guide

Web accessibility monitoring is the automated, scheduled scanning of a website for accessibility failures. Unlike a point-in-time audit, monitoring runs continuously. Code changes, content updates, and third-party scripts all introduce regressions. Monitoring catches them before they become complaints. This guide covers how it works, and where it fits in an ops stack.

Introducing Microsoft DHCP management in OpUtils: From monitoring to full control

If you manage enterprise networks, this scenario probably sounds familiar: An IP conflict surfaces, connectivity drops for a group of users, and the confusion begins. You check your DHCP server, dig through scope utilization, and try to piece together what went wrong, often after the disruption has already occurred. For years, network administrators have needed a single console for visibility and control into DHCP.

OpenTelemetry Monitoring with Netdata

If you've standardized on OpenTelemetry (or you're heading that way), you probably know the collector gets your data out, but where it lands and how useful it is once it gets there are separate problems. Netdata now ingests both OTLP metrics and OTLP logs natively, so your OTel pipelines feed directly into the same monitoring experience as everything else in your infrastructure: same dashboards, same alerting, same query interface. No separate backends, no context switching.

New Explore: Faster answers, less friction, and a better way to investigate your data

There is a moment every engineer knows too well. Something is wrong in production. You have an alert, a vague symptom, and pressure to find the one signal that explains what changed. You open your logs and traces, and you immediately hit the same two problems: the dataset is huge, and the path from “I see something odd” to “I understand why” is full of tiny, exhausting steps. Meet new Explore, our redesigned investigation experience for logs, traces, and spans.

Future Solving with Brian Evergreen (Or: How to Escape those AI Career Jitters)

Brian Evergreen joins the show to challenge the fear-driven narrative around AI and work. Rather than treating the future as something coming for us, Brian argues that leaders and individuals should decide what future they want to create, then work backwards. He explores why “start with the problem” thinking limits AI strategy, how visible strategy and relational leadership can unlock better transformation, and why human connection may become more valuable—not less—in an AI-enabled world. A thoughtful conversation on escaping AI career anxiety, building resilient networks, and creating value beyond efficiency.

WHOIS & RDAP Domain Lookup & Expiry Check

In this video, we’ll walk you through how to set up and configure your Whois and RDAP Domain Lookup & Expiry Checks in Uptime.com. Learn how to monitor and receive alerts before your domain expires, and protect your registration information from unauthorized modifications. We cover step-by-step instructions for setting up checks through the Uptime.com UI and via API.

We Built a Better DNS Propagation Checker. Here's What Makes It Different.

Today we are launching the DNS Spy DNS Propagation Checker. It is free. It works on any domain. It shows you what is happening in more places, in more detail, and faster than the tools you have been using. You can try it right now: dnsspy.io/dns-tools/dns-propagation-checker.

Ameet Talwalkar on Building the AI Research Lab

"We're doing cutting-edge AI, focused on real translational impact: getting our research over the wall and into production." Ameet Talwalkar, Datadog's Chief Scientist, shares what it took to build the AI Research Lab from the ground up — and what makes DAIR different from traditional research teams. At Datadog, research ships. Recent work from the lab includes Toto 2.0, open-weights time series forecasting models ranked on leading benchmarks, and ARFBench, a new benchmark for evaluating AI on real incident data.

Explore for Spans: One View with Infinite Depth

It’s 20 minutes into a P0 incident, and you have already switched between four different tools, re-authenticated twice, and translated queries across three incompatible syntax languages. The root cause you are searching for. Well, that is still out there somewhere. The reality of investigative latency is that most engineering teams face navigation problems, not data problems. During high-pressure incidents, teams lose cognitive momentum due to context switching between disconnected telemetry silos.

Search Azure Blob data in-place with BYOS for Cribl Lake

See how Bring Your Own Storage (BYOS) in Cribl Lake allows teams to connect directly to Azure Blob Storage and instantly search data in place — without moving, duplicating, or rehydrating telemetry. In this demo, Cribl Product Manager Risk Salsa walks through setup, dataset creation, and how to run fast investigations across your Azure-hosted data using Cribl Search.