Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Baking in site reliability with observability and AI: How SpotOn uses Grafana Assistant to keep restaurants running

When you operate a restaurant, the last thing you want to do is shut your doors and turn away guests and staff because of some technology failure. And if you’re the one providing that tech, it’s your job to make sure that doesn’t happen. “For us, observability is about a lot more than just dashboards and alerts.

Reality Bytes: Jon Leighton Returns! How Community Continues to Shape DEX

Head of Nexthink's Digital Community and User Groups Jon Leighton rejoins Reality Bytes with Tom, Sean, and Dina to explore how community remains the beating heart of Digital Employee Experience (DEX). Fresh from Experience London and heading into Experience Boston, Jon shares how Nexthink’s Ambassador Program, user groups, and learning initiatives empower practitioners to grow, collaborate, and lead change. From storytelling and communication to real-world impact and career development, this episode celebrates the people and connections driving DEX forward.

Implement Distributed Tracing with Spring Boot 3

A slow checkout request. A background job stuck waiting on another service. A log message that looks fine — until performance drops. In a Node.js microservices setup, these are the moments that test your observability. You know something's wrong, but tracing the request across dozens of services feels impossible. Distributed tracing changes that. It connects every span in the request's journey, showing exactly where time is spent and where things start to break down.

The 2025 Guide to Open Source Status Page Software

This is an updated version of the 2024 article. Maintaining transparent communication about service availability is crucial for businesses of all sizes. Status pages are an important part of your communication strategy during times of outages and maintenance events. You can choose to go with a fully managed status page provider or host an open-source one yourself.

CIDR blocks vs. IP ranges: Aligning network discovery with business value

At every turn, IT leaders are required to prove the value of every technology investment. Technology business management (TBM) practices encourage connecting tech spend directly to business outcomes, demanding accurate data about what’s in your network and how it supports the organization.

Monitor logs from Amazon EKS on Fargate with Datadog

Amazon EKS on Fargate is a managed service that reduces the operational overhead of maintaining a Kubernetes cluster by abstracting away the underlying infrastructure. In a serverless Fargate environment, each pod is assigned its own isolated compute resources; there is no direct host-level access.

CriblCon 25 Keynote Livestream

IT and security data professionals stand at a crossroads. The practices and technologies that have served you for the last ten years are at their breaking point, facing an onslaught of data growth and complexity that will only accelerate as AI goes mainstream. You have a choice. Stay earthbound or take your telemetry to the stratosphere and beyond.