Operations | Monitoring | ITSM | DevOps | Cloud

Syslog Checks: How to find Insights in the Data Flood

Every SysAdmin knows the feeling. They are swimming in logs—terabytes of them. Every daemon, service, and kernel subsystem religiously writing their activities to syslog. The data exists. The signals are there. Yet, somehow, incidents still are still unpredictable. How is this even possible? Here's why this happens: Traditional syslog infrastructure was designed for storage and retrieval, not detection and response.

How to Prepare Your Network for RTO (Return-to-Office Mandates)

IT teams are being held hostage in the return-to-office debate. They didn't even get a seat at the table. And if you're not at the table, you're on the menu. The job market has cooled dramatically. Canada's unemployment rate hit 7.1% in August 2025, which is the highest since May 2016, excluding pandemic years. Employers noticed. And the RTO mandates started rolling out fast: The debate is heating up. Employees don't want to give up remote work. Executives want people in the office seats.

Understanding Lighthouse: Speed Index

You run Lighthouse and it tells you your Speed Index is bad. But the page looks like it loads fine. You see stuff on screen early. So why is Lighthouse acting like your site is a sloth? Speed Index is a “how fast does this page visually fill in” metric. Not “when did the first pixel show up” (that’s FCP) and not “when did the main content show up” (That’s LCP). It’s the whole above-the-fold loading experience, averaged over time.

Dashboarding Azure: SquaredUp vs Grafana

If you’re looking for a dashboarding solution today, chances are you’ve looked at Grafana or SquaredUp — or both. Grafana is a popular open source dashboarding tool with on-prem and cloud variants, while SquaredUp is the SaaS, cloud-based unified dashboarding solution. Both offer a comprehensive list of data sources that they can plug into and build dashboards. As such, they both also offer an integration with Azure - which is the focus of our discussion today.

Troubleshooting & RCA with Olly

If troubleshooting still feels harder than it should, check on these two numbers: how many dashboards you have, and how many alerts fire every day. For most teams, it’s hundreds of dashboards and thousands of alerts, a sign of maturity, coverage, and good intentions. On the other hand, we also see that when something actually breaks, that coverage rarely turns into clarity fast enough.

AI observability: The backbone of mission resilience in the public sector

Downtime cost the public sector $193 million last year — and the financial hit is only the beginning. Beyond the numbers, downtime in the public sector can also lead to severe consequences for citizens: interrupted access to critical online services, delayed benefits, and stalled emergency response. When citizens cannot rely on government services, downtime becomes more than an inconvenience; it becomes a matter of trust. More than uptime, resilience is the new success metric for modern government.

How to Migrate an Icinga 2 Master in a High Availability Setup

Moving an Icinga 2 master to a new machine requires careful preparation, especially in a master-to-master high availability setup. In production environments, such migrations are often part of broader infrastructure changes, platform standardization, or long-term monitoring strategy decisions. This guide walks you through the process step by step, ensuring a smooth migration without service interruption while keeping your monitoring platform stable and consistent across the environment.

Turning Data Into Decisions with the xMatters Incident AI Agent

When an incident hits, the gap between awareness and action can make all the difference. Responders know the pain: endless tool-switching, chasing updates, and fragmented data. It’s not a lack of capability that slows response; it’s the lack of context and connection. That’s why we built the xMatters Incident AI Agent, a purpose-built, conversational assistant that brings intelligence and automation directly into the heart of incident response.

Follow-the-sun and other on-call models

Most teams run on-call using rotation-based schedules where responsibility shifts every few days or weeks. But some situations call for different models that change who responds based on time zones, expertise, or the type of incident that triggers. This guide walks you through six on-call models that work outside the standard rotation patterns.

5 Offbeat on-call rotations that work

Most teams choose standard on-call patterns like weekly or daily rotations. But sometimes a less conventional rotation can solve a specific problem or just fit better with how your team works. This guide walks you through five offbeat on-call rotations. For each, we look at why it might work for you and the challenges involved. This helps you see the full picture before you decide to try them out. Let’s dive in!