Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Monitoring Backstage with OpenTelemetry:Closing the observability blind spot

‘One small step for a man, but a huge leap for developers’ — me, when I realised how to observe my Backstage with OpenTelemetry. Backstage is often the “portal” through which we manage all our other systems, but who watches the watcher? Recently, we gave a KubeCon Talk, highlighting that monitoring Backstage itself is critical. When Backstage isn’t observable, it becomes a blind spot in your infrastructure.

Sigma Specification 2.0: What You Need to Know

Sigma rules have become the security team equivalent of LEGO bricks and systems. With LEGO, people can build whatever they can imagine by connecting different types of bricks. With Sigma Specification 2.0 rules, security teams can create vendor-agnostic detections without being limited by proprietary log formats. In response to the Sigma rules’ popularity, the team that built them updated them in August 2024, giving security teams new capabilities.

Simple cloud cost management: Grafana Labs integrates open standard FOCUS specification for cloud billing data

At Grafana Labs, we’ve always believed that observability should be open and accessible — that belief extends beyond metrics, logs, and traces to the costs associated with managing observability at scale. That’s why we’re excited to share that we’ve adopted the FinOps Open Cost and Usage Specification ( FOCUS), a community-driven, open standard for cloud billing data.

Hybrid IT Infrastructure Management

Today’s IT environments are rarely confined to a single data center or a single cloud provider. Enterprises are embracing a mix of cloud platforms, virtual machines, and on-premises hardware to stay agile and competitive. This blended environment is known as hybrid IT infrastructure, and managing it effectively is key to keeping systems healthy, secure, and performing at their best.

OnlineOrNot updates from May 2025

As OnlineOrNot has grown, I've been building features quickly to get them into your hands as fast as possible. However, this meant I ended up with multiple versions of similar pages that looked and worked differently from each other. This month, I focused on putting systems in place to create a consistent experience across all parts of the dashboard, making everything look and feel unified.

Jaeger vs Zipkin: Which is Right for Your Distributed Tracing

When requests slow down across your microservices, tracing helps you understand where time is spent. Jaeger and Zipkin are two popular tools for distributed tracing, built to answer a simple question: where did the request go? If you're choosing between them or just exploring options, this guide breaks down the differences and when each one might be a better fit.

Prometheus Alerting Examples for Developers

Everything looks fine—dashboards are green, logs are quiet. But users start reporting slow response times. No errors, no traffic spikes. Just a general slowdown. It’s a common situation. Not all problems show up as crashes or clear failures. Sometimes, performance degrades quietly, and standard metrics don’t catch it early. But that's where Prometheus alerting can help, if you're monitoring the right signals.