Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Outage map now available in your StatusGator board

We’re excited to introduce a helpful new update to your StatusGator experience – the service outage map is now built directly into your StatusGator account. StatusGator has displayed outage heatmaps on our public website’s service landing pages. These maps helped users understand where issues were being reported across the globe. Now, we’ve taken that same valuable visibility and placed it inside your board.

StatusGator earns SOC 2 Type 2 certification

We are absolutely thrilled to share some momentous news: StatusGator has officially achieved SOC 2 Type 2 certification! This isn’t just another checkbox on a compliance list – it’s a powerful validation of our dedication to safeguarding your data and delivering the reliable service you depend on.

Stay audit-ready with real-time file change alerts in Site24x7 server monitoring

Maintaining the integrity of server files and directories is essential for security, operational resilience, and compliance. Whether it’s business-critical application configurations, sensitive data files, or audit logs, any unauthorized, unexpected, or accidental modification can jeopardize service continuity and expose an organization to regulatory risks. Manual file monitoring is impractical at scale.

How OpManager powered IT reliability for DWHIN

In healthcare, every moment counts—and for Detroit Wayne Integrated Health Network (DWIHN), every heartbeat depends on a network that doesnt skip one. Serving over 75,000 patients across Detroit and Wayne County, DWIHN’s IT network powers essential behavioral health services, from autism care to crisis intervention. When its systems started showing signs of strain, DWIHN turned to ManageEngine OpManager to bring reliability, clarity, and calm back to its IT operations.

Introducing Kentik AI Advisor

Introducing Kentik AI Advisor. AI with a comprehensive understanding of your network that thinks critically and advises how to design, operate, and protect infrastructure at scale. With the rise of hybrid cloud networks and the growing demands of AI infrastructure, network teams are under pressure to balance cost, performance, and security, often with limited resources that delay critical strategic initiatives.

Better together: Cribl and Microsoft Fabric just got radically simpler

In September, I wrote about how Cribl and Microsoft Fabric Real-Time Intelligence provide a powerful combination, unlocking new analytics capabilities for security and IT teams. I also said there was more to come… Today, Cribl is thrilled to announce a new Cribl Destination for Microsoft Fabric Real-Time Intelligence, marking another big step forward in our collaboration with Microsoft to make it much easier for Cribl customers to use Fabric.

How to Monitor RabbitMQ

A queue quietly fills up overnight. Memory hits the configured watermark and RabbitMQ blocks all publishers. Your entire message pipeline freezes, and you discover the problem when users start complaining. This scenario repeats across thousands of production systems because teams don't monitor RabbitMQ properly. The broker exposes comprehensive metrics, but most engineers don't know which ones predict failures or how to track them.

Ep 18: AI has a memory problem, just like you do

In this episode of Masters of Data, we dive into how AI learns, examining both how we teach it and what it derives from human performance, as well as why context plays a crucial role in AI interactions. We break down five key components of AI training and talk about why we should view AI as a tool under human control rather than an autonomous entity. We explore the challenge of maintaining context in AI—much like our own memory struggles—and discuss methods, such as retrieval-augmented generation, that can help AI retain context more effectively.

Datadog GPU Monitoring: Optimize and troubleshoot AI infrastructure

With Datadog GPU Monitoring, engineering and ML teams can monitor GPU fleet health across cloud, on-prem, and GPU-as-a-Service platforms like Coreweave and Lambda Labs. Real-time insights into allocation, utilization, and failure patterns make it easy to spot bottlenecks, eliminate idle GPU spend, and resolve provisioning gaps. By tying usage metrics directly to cost and surfacing hardware and networking issues impacting performance, Datadog helps teams make fast, cost-efficient decisions to keep AI workloads running reliably at scale.

Unlocking Full Application Visibility with LogicMonitor

In today’s digital landscape, application performance isn’t just about monitoring several key apps and “keeping the lights on,” it’s about understanding the full breadth of your interconnected business services and ensuring you’re delivering seamless, reliable experiences to customers and teams alike. But as applications grow increasingly distributed across cloud, on-prem, and hybrid environments, monitoring them holistically can become a serious challenge.