Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Cut Costs, Not Visibility. Use S3 for Low-Cost Log Retention and Faster Response.

Why pay for continuous ingestion of data you rarely use? Learn how to maintain a lean data strategy by keeping long-term logs in cheap S3 storage, while retaining the power to "promote" specific slices into Splunk whenever an audit or investigation arises. See how Promote for Amazon S3 gives you the speed of local indexing without sacrificing speed in investigations.

AlphaFold, Office Politics, and Mustafa Suleyman's Two Futures (w/Benedict Lelijveld)

In this episode, Benedict Lelijveld joins us to unpack what it feels like to start a career in an era shaped by COVID disruption, hybrid work, and accelerating AI. We dig into his writing on Mustafa Suleyman and the idea of “pessimism aversion”: holding genuine hope for breakthroughs (from personal AI to advances in biology) while staying clear-eyed about risks like misuse, weak regulation, and who really benefits. Benedict also reflects on what early-career professionals lose when work becomes too remote—and why protecting your voice, curiosity, and craft matters more than ever as automation spreads.

Case Study - Troubleshooting Storage Failures in a VMware ESXi Infrastructure

IT problems happen even in the best architected infrastructure due to configuration changes, failures, upgrades and such. How quickly and effectively you can detect and resolve such problems dictates how efficient your IT operation is. Today, I’ll cover how eG Enterprise helped us troubleshoot a hardware failure (a storage battery failure) that that caused a cascade of failures in a VMware ESXi infrastructure.

Notes from the Field: XenServer falling back to file-based licensing when using LAS

Citrix has been transitioning products toward License Access Service (LAS) as the modern licensing method. Unlike traditional file-based licensing, LAS introduces service-based communication between products and the Citrix License Server. As of 15 April 2026, LAS becomes the mandatory licensing method for supported products. Environments still relying on file-based licensing will need to transition before that date.

Microsoft SCOM Tips & Tricks

This one is for all the Microsoft SCOM geeks out there — 99 practical tips & tricks to make managing SCOM way easier. The tips compiled here draw from community experts, SCOM-focused blogs, Microsoft’s official documentation, and the hands-on experience at NiCE. You may already know some of them, but having them all organized in one place makes it easy to reference and put them into practice.

I let Claude investigate a production incident with Honeybadger's MCP server

In this demo, Kevin shows how you can use Honeybadger's MCP server with Claude to investigate a production incident — going from a natural language prompt to a complete incident dashboard in minutes. Honeybadger is an application health monitoring platform that helps developers catch errors, track performance, and stay on top of incidents. The MCP server lets AI assistants like Claude query your Honeybadger data directly, so you can investigate issues conversationally without digging through dashboards manually.

Monitoring and Optimizing a Hybrid Cloud Environment | WhatsUp Gold

This webinar focuses on Monitoring and Optimizing a Hybrid Cloud Environment. Downtime is an expensive inconvenience. Yet many IT teams still face monitoring blackouts due to rigid licensing models and outdated failover strategies. In this session, we’ll introduce a smarter approach: High Availability by Design. Whether you're scaling operations or modernizing infrastructure, this session will enable you with the tools and insights to build a resilient, future-ready monitoring strategy.

Reinventing the Incident Responder's Day: Empowering Tier 2 SOC Analysts with Splunk's Agentic SOC Platform

The Tier 2 SOC Analyst or the Incident Responder (often hailed as the "Sherlock Holmes of the network") faces an increasingly complex and relentless digital landscape. In a world where analysts are being overwhelmed by alerts, held back by fragmented, manual tooling and inefficient workflows, incident responders are charged with the critical task of identifying, analyzing, and mitigating security threats.