Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Detecting an AWS Outage and DR Lessons

A few weeks ago, on 20th October 2025, AWS suffered a widespread outage in its US-EAST-1 region that affected a large number of customers globally. More than 1,000 apps and websites were impacted including major banks and popular games, streaming and social platforms such as WhatsApp, Snapchat, Fortnite and Pokémon Go.

From Crashes to Clarity: What's New in Percepio Detect 2025.2

Think of Percepio Detect as a security camera for your firmware—always monitoring, but only storing data when something unusual happens, such as crashes or performance anomalies. By providing rich debugging information when needed while keeping the overall data volume to a minimum, Detect enables continuous observability over unlimited time, even on resource-constrained devices such as 32-bit microcontrollers.

The four pillars holding up your digital business, and what happens when they crumble

When we published the first Internet Resilience Report in 2024, the world was still reeling from the CrowdStrike outage that left airlines grounded and financial institutions scrambling. A year later, the stakes are even higher. The 2025 edition confirms what many of us already feel every day in IT Operations: resilience is no longer about uptime alone. It’s about protecting revenue, customer trust, and digital performance at scale.

The Architecture of Automation: Why IT Doesn't Lie

Let’s start with something most people get wrong. Automation isn’t magic. It’s math. It does exactly what it’s told. Nothing more, nothing less. Every action, every response, every output is a reflection of truth in motion. And that’s where value actually begins. Most organizations still treat automation like a shortcut: a way to go faster, to handle more alerts, to “keep up.” But speed isn’t the value. Truth is.

Rollbar + Vercel built for how you ship

Vercel helps you ship fast. We help you ship safe with code‑first observability that connects errors to the code and deploys behind them. Together you get speed with clear insight into what is running in production. Today we’re launching our native integration in Vercel’s Observability category so you can connect Rollbar to your Vercel projects in minutes, map environments cleanly, and track deployments from day one.

Splunk Developer Program

A short video that introduces the Splunk Developer Program, highlights the end-to-end support and tooling it offers, and showcases how developers can build, test, and grow impactful apps with confidence. The video will follow the journey of a first-time app builder who discovers the program, uses its resources, and becomes an active, recognized contributor in the Splunk community.

How feedback loops power progressive software delivery

Modern engineering teams face competing priorities. Developers are expected to deliver new features faster than ever, but users expect rock-solid reliability with every release. Shipping quickly can feel like you’re gambling with user trust. If you move too fast, you risk outages, but if you move too slowly, innovation stalls.

Observability and FedRAMP in Action: The VA's Mission to Deliver Reliable Digital Service

Ensuring digital services remain accessible, reliable, and secure is a high priority for any organization operating at scale. For the Department of Veterans Affairs (VA), this focus is central to its mission of providing quality care to veterans, their families, and caregivers. Often described as “the largest IT shop in the United States,” the VA manages 2.7 million pieces of equipment across a vast network of interconnected systems.

Eliminate unnecessary costs in your Amazon S3 buckets with Datadog Storage Management

Cloud object storage powers a wide range of workloads, from AI training datasets to customer-facing media libraries. As your data grows into the petabyte scale, managing storage costs and ensuring reliability requires fine-grained visibility. You need answers to questions like: Which specific teams, services, workloads, or datasets are driving spend? Which data is cold and should be archived? What fixes will have the biggest impact on cost and performance?