Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Syslog Checks: How to find Insights in the Data Flood

Every SysAdmin knows the feeling. They are swimming in logs—terabytes of them. Every daemon, service, and kernel subsystem religiously writing their activities to syslog. The data exists. The signals are there. Yet, somehow, incidents still are still unpredictable. How is this even possible? Here's why this happens: Traditional syslog infrastructure was designed for storage and retrieval, not detection and response.

How to Prepare Your Network for RTO (Return-to-Office Mandates)

IT teams are being held hostage in the return-to-office debate. They didn't even get a seat at the table. And if you're not at the table, you're on the menu. The job market has cooled dramatically. Canada's unemployment rate hit 7.1% in August 2025, which is the highest since May 2016, excluding pandemic years. Employers noticed. And the RTO mandates started rolling out fast: The debate is heating up. Employees don't want to give up remote work. Executives want people in the office seats.

Understanding Lighthouse: Speed Index

You run Lighthouse and it tells you your Speed Index is bad. But the page looks like it loads fine. You see stuff on screen early. So why is Lighthouse acting like your site is a sloth? Speed Index is a “how fast does this page visually fill in” metric. Not “when did the first pixel show up” (that’s FCP) and not “when did the main content show up” (That’s LCP). It’s the whole above-the-fold loading experience, averaged over time.

Will humans be replaced by AI? The truth

Agentic AI doesn’t replace analysts, it augments them. The real value comes from making teams more efficient, not smaller. This is the perspective most people miss. Additional Resources: About Elastic Elastic, the Search AI Company, enables everyone to find the answers they need in real time, using all their data, at scale. Elastic’s solutions for search, observability, and security are built on the Elastic Search AI Platform — the development platform used by thousands of companies, including more than 50% of the Fortune 500.

Splunk Attack Range v5 Demo

The Splunk Attack Range is an open source project that lets security teams spin up instrumented cloud environments, simulate adversary behavior, and use the generated telemetry to build and test detections in Splunk. Whether you are a detection engineer tuning rules, a purple team validating coverage, or a developer automating tests, Attack Range gives you a repeatable, cloud-based lab. This post highlights what Attack Range does, how it works, and how to get started - whether you prefer a web UI, a REST API, or the command line.

Dashboarding Azure: SquaredUp vs Grafana

If you’re looking for a dashboarding solution today, chances are you’ve looked at Grafana or SquaredUp — or both. Grafana is a popular open source dashboarding tool with on-prem and cloud variants, while SquaredUp is the SaaS, cloud-based unified dashboarding solution. Both offer a comprehensive list of data sources that they can plug into and build dashboards. As such, they both also offer an integration with Azure - which is the focus of our discussion today.

Troubleshooting & RCA with Olly

If troubleshooting still feels harder than it should, check on these two numbers: how many dashboards you have, and how many alerts fire every day. For most teams, it’s hundreds of dashboards and thousands of alerts, a sign of maturity, coverage, and good intentions. On the other hand, we also see that when something actually breaks, that coverage rarely turns into clarity fast enough.

AI observability: The backbone of mission resilience in the public sector

Downtime cost the public sector $193 million last year — and the financial hit is only the beginning. Beyond the numbers, downtime in the public sector can also lead to severe consequences for citizens: interrupted access to critical online services, delayed benefits, and stalled emergency response. When citizens cannot rely on government services, downtime becomes more than an inconvenience; it becomes a matter of trust. More than uptime, resilience is the new success metric for modern government.

How to Migrate an Icinga 2 Master in a High Availability Setup

Moving an Icinga 2 master to a new machine requires careful preparation, especially in a master-to-master high availability setup. In production environments, such migrations are often part of broader infrastructure changes, platform standardization, or long-term monitoring strategy decisions. This guide walks you through the process step by step, ensuring a smooth migration without service interruption while keeping your monitoring platform stable and consistent across the environment.