Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Stream AWS Metrics to Grafana with Last9 in 10 minutes

It’s 2:47 AM and your Lambda functions are timing out. API response times are spiking. You’re flipping between the CloudWatch console, your APM tool, and your logs, trying to figure out what’s going wrong. CloudWatch has the metrics you need: CPU usage, memory pressure, and request rates — but connecting that data to what your app is doing takes time. The delay in stitching it all together slows down your incident response.

Set Up ClickHouse with Docker Compose

ClickHouse is built for high-performance OLAP workloads, capable of scanning billions of rows in seconds. If your analytical queries are bottlenecked on PostgreSQL or MySQL, or you're burning too much on Elasticsearch infrastructure, ClickHouse offers a faster and more cost-efficient alternative. This blog walks through setting up ClickHouse locally with Docker Compose and scaling toward a production-grade cluster with monitoring in place.

Risk Register for SREs: A Practical Guide to Proactive Incident Prevention

A risk register is one of the most powerful tools in an SRE's arsenal for maintaining system reliability. By systematically documenting potential threats to your infrastructure and services, you can shift from reactive firefighting to proactive risk management.

Golang Application Performance Monitoring: A Comprehensive Guide

Application Performance Monitoring (APM) refers to the practice of tracking, analyzing, and optimizing the performance and availability of software applications. When it comes to Go (Golang), a language known for its concurrency, speed, and efficiency, APM becomes crucial to ensure that your applications stay fast, reliable, and scalable under real-world loads. APM in Go involves monitoring the runtime behavior, request response times, system resource usage, and error patterns across your application.

Top tips: How to be a beginner again

Top tips is a weekly column where we highlight what’s trending in the tech world and list ways to explore these trends. This week, we're talking about what it really means to start fresh, stay curious, and make space to be a beginner again—even when your calendar’s packed. If your calendar is crammed with back-to-back meetings, messages that never stop, and deadlines breathing down your neck, you're not alone.

Modern Redux Debugging: Common Bugs and Solutions in 2024-2025

Redux remains a cornerstone of React state management, but developers continue to encounter persistent bugs and new challenges. State mutation errors remain the most common Redux bug, affecting over 70% of Redux applications, while new issues emerge with Redux Toolkit 2.0, TypeScript integration, and React 18/19 compatibility. This comprehensive guide explores the most prevalent Redux debugging challenges and provides practical solutions for modern development.

Unlock Deeper Insights: Introducing GitLab Event Integration with Mezmo

Following the popularity of our existing GitHub integration, we’ve extended similar capabilities to GitLab users. You can now ingest GitLab events directly into Mezmo Telemetry Pipelines and route them to any destination. This provides a powerful new way to monitor, alert, and react to activity within your GitLab repositories.

Kubernetes Observability with OpenTelemetry | A Complete Setup Guide

Kubernetes provides a wealth of telemetry data from container metrics and application traces to cluster events and logs. OpenTelemetry offers a vendor-neutral, end-to-end solution for collecting and exporting this telemetry in a standardised format.