Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

StatusGator now supports Microsoft Teams Workflows

We’ve updated our Microsoft Teams integration to support workflows — Microsoft’s new and recommended approach to incoming webhooks. As Microsoft evolves its platform, it is phasing out the legacy Connectors feature in favor of Workflows. At StatusGator, we’re committed to keeping up with these changes so your integrations remain reliable and future-proof.

How Sentry could stop npm from breaking the Internet

Caching is great! When it works… When it fails, it puts a big load on your backend, resulting in either a self-inflicted DoS, increased server bills, or both. This article is inspired by a real-world incident that happened to npm back in 2016. In the next part, Ben recounts his personal experience responding to the incident while working at npm.

Why continuous profiling is the fourth pillar of observability

Developers have long used profilers to diagnose performance bottlenecks and improve the efficiency of their code. But a modern version of profiling, continuous profiling, is quietly redefining what profiling is and what it can do. By running nonstop in production with very low overhead, continuous profilers give teams always-on visibility into how their code behaves in the real world.

How to Build Resilient Telemetry Pipelines with the OpenTelemetry Collector: High Availability and Gateway Architecture

Let’s bring that back. Today you’ll learn how to configure high availability for the OpenTelemetry Collector so you don’t lose telemetry during node failures, rolling upgrades, or traffic spikes. The guide covers both Docker and Kubernetes samples with hands-on demos of configs. But first, let’s lay some groundwork.

7 Clear Signs Your Team Needs Centralized Monitoring

Managing multiple systems without centralized monitoring is like trying to watch security footage from 20 different screens simultaneously. You might catch some issues, but you'll inevitably miss critical problems until they explode into major incidents. If your team is struggling with scattered monitoring tools, delayed incident responses, or constant firefighting mode, it's time to evaluate whether you need a centralized monitoring solution. Here are the key warning signs to watch for.

How AI Agents Reason, Act, and Automate at Scale

In our previous post, we explored the urgent need for intelligent automation in network automation, specifically how the Model Context Protocol (MCP) enables AI agents to dynamically discover and interact with the necessary tools. But access to tools is only part of the equation. To truly operate autonomously in complex environments, agents need not only connectivity but also intelligence.

SD-WAN, SASE, SSE, and the Coffee Shop Network: From Distraction to AI Superpower

Back in 2018, I wondered (perhaps loudly if SD-WAN was just IT’s hype-of-the-year, destined for the same eye-rolls as signature-based antivirus and GDPR compliance drives. Even then, I knew we couldn’t let messaging fatigue blind us to real technology shifts. Fast-forward to 2025: SD-WAN (Software-Defined Wide Area Network) not only stuck around, but became the springboard to something far bigger – SASE (Secure Access Service Edge).

Grafana Campfire - Using the Grafana MCP Server (Grafana Community Call - July 2025)

In this month of the Campfire Community call, we will exploring the Grafana MCP (Model Context Protocol) server - an open-source tool that enables AI assistants to directly interact with your Grafana instance. We will learn some basics such as: Join me (Usman), Matt Ryer, and David Kaltschmidt for this exciting session. Expert guests: Ioanna Armouti, and Luccas Quadros *HELPFUL LINKS* Feel free to use the YouTube live chat feature to start submitting questions, and we will add them to the agenda.

13 Best Log Analysis Tools of 2025. Top Paid, Free & Open-Source Log Analyzers Reviewed

Log analysis and management tools have become essential in troubleshooting. With log analyzers you can extract meaningful data from logs to pinpoint the root cause of any app or system error, and find trends and patterns to help guide your business decisions, investigations, and security. If you’re not already using such a tool, now is the time to start looking for one.