Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Introducing the Causely data source plugin for Grafana

Endre Sara is a Co-Founder of Causely, where he’s building a causal reasoning platform to continuously assure service reliability and eliminate human troubleshooting. Previously, Endre was VP of Advanced Engineering at Turbonomic and a VP at Goldman Sachs. At Causely, we believe observability tools shouldn’t just collect more data—they should enable you to understand it.

Breaking Down Silos with Correlation and Context

In modern IT environments, data is abundant, but clarity is rare. Enterprises deploy dozens of monitoring tools to collect metrics, events, and logs from across the network, yet when something goes wrong, teams still scramble to connect the dots. Why? Because these data streams exist in siloes, isolated by format, source, or system.

Why you need Internet Performance Monitoring (IPM)

A few decades ago, monitoring your application was simple—everything ran on-premises, and performance issues were easier to pinpoint. But today, applications are built on globally distributed services, running across internal and external systems in the cloud, all connected through the Internet. In a world where the Internet is now your network, traditional APM (Application Performance Monitoring) isn’t enough. It only focuses on code and infrastructure, leaving critical blind spots that impact performance, availability, and user experience.

VictoriaMetrics Cloud: What's New in Q1 2025?

Time flies, and just like that, we are already in April! The first quarter of 2025 has been packed with exciting updates for VictoriaMetrics Cloud. If you joined our latest Quarterly Virtual Meetup, you might have already seen some of these announcements alongside other great improvements across all things VictoriaMetrics. In this post, we’ll take a closer look at what’s new in VictoriaMetrics Cloud.

All about OTel and Logging on Kubernetes with Loki (Loki Community Call April 2025)

In this pre-recorded Loki Community Call, we talk all about OTel and logging on Kubernetes with Cyril Tovena, Ward Bekker, Jay Clifford, and Nicole van der Hoeven at KubeCon EU 2025 in London. We discuss when why you should switch to OTel and why you shouldn't, what OTLP is exactly, and best practices for ingesting data through an OTLP endpoint.

Correlation ID vs Trace ID: Understanding the Key Differences

You’re staring at logs, trying to figure out what caused that odd error in the middle of the night. Or maybe you're following a chain of requests across services, hoping to understand how one user action triggered a series of unexpected behaviors. That’s where distributed tracing and request tracking—specifically, correlation IDs and trace IDs—are invaluable. It’s the kind of detail that can make debugging faster and less painful.

Everything You Need to Know About OpenTelemetry Histograms

Modern systems throw off a lot of data—metrics, traces, logs—sometimes more than we know what to do with. When you're trying to understand how values spread out over time (like response times, memory usage, or queue lengths), averages alone don’t tell the full story. OpenTelemetry histograms help fill in those gaps. This guide walks through what they are, why they matter, and how DevOps engineers can use them to improve observability in real systems.

Link to full status page from embedded iframe

We’ve rolled out a small update to the StatusGator iframe embed feature! Now, when you embed your status page on your website or app, it can include a link to your full StatusGator status page — giving your users a simple way to view detailed information about outages. And there’s more: If you’ve uploaded a favicon in your Status page settings, it will now appear next to the link in the iframe.

How to Build a Successful SIEM Migration Strategy

At least once a week, a team reaches out to discuss migrating from an established SIEM or analysis platform. This major decision is influenced by several compelling factors, which can create significant work for engineering teams and pose risks to the business. The cost of switching to a new platform, often referred to as displacement costs, can be substantial.