Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

New Plugins, Faster Writes, and Easier Configuration: What's New with the InfluxDB 3 Processing Engine

The Processing Engine is one of the most powerful features in InfluxDB 3. It lets you run Python code at the database—transforming data on ingest, running scheduled jobs, or serving HTTP requests—without spinning up external services or building middleware. You define the logic, attach it to a trigger, and the database handles the rest. Since launching the Processing Engine, we’ve been building out both the engine itself and the ecosystem of plugins that run on it.

Operating agentic AI with Amazon Bedrock AgentCore and Datadog LLM Observability: Lessons from NTT DATA

This guest blog post is by Tohn Furutani, SRE Engineer at NTT DATA. Over the past year, the conversation around generative AI has shifted from single-shot use cases—such as summarization, Q&A, and chat interfaces—to agentic AI systems that can make decisions based on context, plan multistep actions, invoke tools, and adapt as conditions change.

AI agent observability: The developer's guide to agent monitoring

Most "agent observability best practices" content reads like a compliance checklist from 2019 with "AI" pasted over "microservices." Implement comprehensive logging. Establish evaluation metrics. Create governance frameworks. Not a single line of code. No mention of what happens when your agent silently picks the wrong tool on turn 3 and you need to figure out why.

How to Set Up Your Monitoring System Alerts

You could have the most detailed metrics displayed on your dashboard, but if no one gets notified when things break, you’re just collecting data. Alerts help turn this passive monitoring into an active response. It’s like they tell you, “Hey, your error rate just spiked!” or “Your memory usage is through the roof,” even before your users start filing support tickets, or worse, give up on your tool entirely.

Query fair usage in Grafana Cloud: What it is and how it affects your logs observability practice

In Grafana Cloud we use a simple yet generous formula that lets you query up to 100x your monthly ingested log volume in gigabytes for free. This works for the vast majority of our customers, but if you aren’t careful and strategic with your usage, you could find yourself with an overage bill.

Traditional Automation vs. AIOps vs. Self-Healing Ops vs. Autonomous IT Explained

Autonomous IT becomes real when teams move from insight to governed action. Most IT teams still operate on an alert-first, human-coordinated model. When something breaks, alerts fire across multiple tools, engineers get pulled in, and the first part of the response goes to figuring out who owns the problem, which signals matter, and how far the impact has spread. Containment comes after that. That sequence made sense in slower, more isolated environments.

March 2026: IsDown Users Saved 10.5 Hours with Early Outage Detection

In March 2026, IsDown users collectively saved 10.5 hours by receiving outage alerts before vendors officially acknowledged problems. The most significant early detection gave users a 2.3-hour head start when The Federal Reserve's FedACH system experienced issues. This data reveals the persistent gap between when users experience problems and when vendors update their status pages.

How to check if an item is back in stock?

Are you one of those trying to desperately get your hands on a new RTX 3080, 3070, 3060 Ti, & 3090 in 2021? Or maybe you prefer the new PlayStation 5 or Xbox Series X console. Basically, any item that’s on pre-sale or hard to get (including the uniquely designed piece of clothing for your girlfriend). If your favorite online store doesn’t have a “watchdog”, we have the best solution for you. Now how would you know it’s already back in stock? There’s an easy way!

Top 10 Website Monitoring Tools of 2026.

Most website monitoring tools look similar until the first real incident. That is when alert speed, false positives, check coverage, and day-to-day usability matter more than a long feature page. UptimeRobot often comes up early for a reason: it is easy to start with, clear to manage, and focused on the checks many teams need first. Still, it is not the only option worth looking at.

Beyond Maintenance: Why Modernizing Your Messaging Infrastructure is the Ultimate Competitive Edge

Modernizing messaging infrastructure delivers 188% ROI and payback in under 6 months, according to Forrester TEI study. Move beyond maintenance cycles to unified visibility, AI-driven efficiency, and secure self-service that transforms middleware from bottleneck to competitive advantage.