Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Time Series Meets Graph: Understanding Relationships in Streaming Data

Data systems rarely operate as isolated components. Machines depend on sensors, services rely on other services, and devices exchange data through shared gateways. When something changes, the impact often spreads beyond a single metric. To trace how changes move through complex systems, many teams turn to graph-style analysis to map dependencies and follow cause and effect.

The fragile web: 2025's lessons on uptime, reality, and engineering rigor

If you are into IT operations or leadership, you likely spent at least one weekend in 2025 huddled over a laptop while the rest of the world slept. For the last decade, our industry has pursued five nines (99.999% uptime) as the holy grail. We architected redundant systems, deployed across multiple availability zones, and optimized our code until it hummed. We convinced ourselves that if we just engineered hard enough, we could tame the chaos of the internet. We thought we could. We really did.

When AI Speeds Up Change, Knowing First Becomes the Constraint

In a recent post, I argued that AI doesn’t fix weak engineering processes; rather it amplifies them. Strong review practices, clear ownership, and solid fundamentals still matter just as much when code is AI-assisted as when it’s not. That post sparked a follow-up question in the comments that’s worth sitting with: With AI speeding things up, how do teams realise something’s gone wrong before users do? It’s the right question to ask next.

How to Monitor SaaS Status in 2026 : A Complete Guide

This is an updated and expanded version of the older guide. According to the 2025 State of SaaS report, organizations use an average of 106 SaaS apps. Staying on top of your SaaS vendors' status is as important as monitoring your own services. The Cloudflare, AWS, Azure, and Google Cloud outages in 2025 were strong reminders of this fact.

OpenTelemetry and Grafana Labs: What's new and what's next in 2026

For many teams, 2024 was the year of asking, “can OpenTelemetry do this?” In 2025, the community answered with a resounding “yes,” moving beyond experimentation to focus on what matters most in practice: stability, ease of use, and cross-project compatibility. That momentum now sets the stage for what’s to come for OpenTelemetry in 2026.

A Day in the Life of ITOps: Why Manual Ops Can't Scale Without AI Automation

A typical ITOps day is consumed by manual triage, fragmented context, and coordination work that expands with scale and slows every incident. Your day begins with alerts that arrived overnight. The symptoms are partial and the blast radius is unclear, so the first task is not remediation; it is figuring out what is real, what is related, and what matters. Next, a ticket comes in with a brief description and no evidence. Ownership is unclear.

The Ultimate Guide to Error Monitoring: Why Error Monitoring Matters More Than Ever in 2026

Errors get a bad rap, but they’re just trying to help. Remember, errors aren’t the enemy, they’re the messenger. Conventional wisdom tells you to think of errors as failures, as things that thwart progress and frustrate developers. The reality is that errors are actually there to help you. They prevent you from shipping broken code to production. They stop your application from continuing to operate incorrectly and costing you money.

Logging in React Native with Sentry

Logs are often the first place dev teams look when they investigate an issue. But logs are often added as an afterthought, and developers struggle with the balance of logging too much or too little. As a seasoned developer, you may remember a time when you were asked to investigate an issue and then handed a 200 MB plaintext log file. Three hours and four Python scripts later, you would realize that the problem was in a different component.

Simplify the Collection Layer and Move to OTel Without the Agent Sprawl

This is blog 2 in our New Year, New Resolution Series on OTel migrations. Read the first post, "New Year, New Telemetry: Resolve to Stop Breaking Dashboards", here. Most New Year’s resolutions fail because they require a "big bang" change. If your 2026 mandate is to migrate to OpenTelemetry (OTel), the traditional approach is the definition of friction.