Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

How to Set Up a Syslog Server: A Complete Step-By-Step Guide

Syslog servers are essential for centralized log management, helping network engineers monitor, troubleshoot, and secure network devices efficiently. This guide walks you through setting up a syslog server from scratch, focusing on practical steps using rsyslog on a Linux system—a common and robust choice for syslog collection. Windows does not have a native syslog server, so you need third-party software.

New: Status modal integration is here!

At StatusGator, our mission is to make status transparency effortless—for you and your users. Today, we’re introducing a new way to keep your users informed in real time: the Status Modal Embed. Now you can display a compact, customizable modal on your website that shows the current status of your services—incidents, maintenance, or full operational status—all with a direct link to your full status page.

Observability trends in Japan: Insights from Grafana Labs' latest survey

Japanese organizations are focused on controlling costs and limiting complexity—and they might be getting ready to broaden their adoption at just the right time, according to analysis of a micro survey on observability recently conducted by Grafana Labs. Observability is an evolving space in Japan, and this is the first time Grafana Labs has run a Japanese version of our annual Observability Survey.

Invisible dependencies, visible impact: Lessons from the Google Cloud outage

June 12, 2025. A date most of the Internet won’t remember — but anyone relying on Google Cloud will. In the span of minutes, a routine quota update snowballed into global disruption. APIs stopped responding. Dashboards stayed green. And across continents, teams scrambled to figure out if the problem was theirs — or Google's. It wasn’t a cyberattack. It wasn’t a datacenter fire.

How to Use an SLA Uptime Calculator to Understand Service Availability

TL;DR A Service Level Agreement (SLA) defines the required uptime for a service. An SLA uptime calculator helps convert uptime percentages into actual allowed downtime across different timeframes. This guide explains how these calculators work, why uptime matters, and how to monitor performance to meet SLA targets.

OpenTelemetry for Go: measuring the overhead

Everything comes at a cost — and observability is no exception. When we add metrics, logging, or distributed tracing to our applications, it helps us understand what’s going on with performance and key UX metrics like success rate and latency. But what’s the cost? I’m not talking about the price of observability tools here, I mean the instrumentation overhead.

Everything You Need to Know About Event Logs

Your code passes locally, CI is green, and the deploy goes through. Then production throws a 500, and the trace isn’t helpful. And here, event logs help. A log captures timestamped records of what the app did HTTP requests, DB queries, cache misses, retries, failures. These entries give you enough context to debug without reproducing the issue locally. Especially when dealing with distributed systems, logs are often the only consistent source of truth.