Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Log Management, Log Analytics and related technologies.

The Data Plane Reality: OTel Scales, While Topology UX Lags

OpenTelemetry won the architectural standards battle. At scale, though, telemetry breaks more like plumbing than code. It breaks quietly, across a graph, with a blast radius you don’t understand until it’s expensive. With over 65% of organizations now running more than 10 collectors in production, hybrid deployments across Kubernetes and VMs are accelerating fast. Telemetry standardization is no longer a project milestone. It is a baseline expectation.

Working as a remote engineer at Cribl | Building the AI Platform for Telemetry

Learn what it’s like to work as an engineer at Cribl, a remote-first company building the AI platform for IT and security data. In this recruiting video, Cribl’s engineering and support leaders share how fully distributed teams collaborate, solve hard data problems, and grow their careers while working from around the world. You’ll hear from managers and leaders in site reliability engineering, security incubation, and technical support about.

Why Does Network Topology Decide How Fast Your Network Recovers?

In this video, learn why network topology plays a critical role in network resilience, troubleshooting, and recovery. Discover how understanding network dependencies, eliminating single points of failure, and maintaining clear visibility can help IT teams reduce downtime and accelerate incident response. In this video, you'll learn.

9 Powerful Log Monitoring Best Practices to Follow in 2026

How many of your last five incidents were already sitting in the logs before anyone noticed? Most teams already collect more than enough log data. The problem starts with what happens next, and the same four gaps show up almost everywhere: This guide covers the log monitoring best practices that close those gaps. It walks through how to collect, structure, correlate, retain, and secure logs, so monitoring becomes a steady process and not a scramble during the next incident.

Use This OTel Processor to Prevent Your Dashboards From Breaking

A semantic-convention rename (http.method → http.request.method) can silently break your RED metrics — no errors, just gaps in dashboards and alerts. The OpenTelemetry Collector's schema processor fixes it: put it first in your pipeline and it normalizes attribute names no matter what each service emits. Migration mode writes BOTH the old and new names, so you get zero-downtime upgrades while queries keep working.

Un-observable AI is Un-trustworthy AI

Recently, someone talked Chipotle’s customer support agent into reversing a linked list – a task completely unrelated to burritos in any way. Screenshots circulated, people laughed, but underneath the joke sat a sharper question. If a production support agent will do that on a public channel, what else will it do that nobody is screenshotting? The bug is funny. The trust gap behind it is not.