Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Bridging partners in pursuit of agentic AI - Part 2: How leaders can position themselves for the future

From ecosystem foundations to future advantage In Part 1: Why partnerships matter for enterprise intelligence, we explored how enterprises are moving from experimentation to scalable impact with agentic AI and how ecosystems make that possible. But naturally, the next question is: Where do we go from here?

The Agentic Enterprise Needs a Nervous System

Over the weekend, when Salesforce introduced the concept of the Agentic Enterprise, it wasn’t defining a new market trend. It was signaling an inflection point. A moment when the conversation about artificial intelligence stopped being about tools and started being about trust. For the first time in decades, enterprise software isn’t simply enabling decisions. It’s making them. Systems are reasoning, choosing, and acting in real time across sprawling digital ecosystems.

Live in Boston: Data, DEX, and a Few Fist Fights @ Nexthink Experience

Tim and Tom host another special live edition of The DEX Show, this time from the Omni Boston Hotel, recorded during last week’s Experience Boston. Joined by Christina Lahr (Bayer), James Krick (Campbell’s), and Ryan Way (Warburg Pincus), the hosts dig into more real-world stories of data-led IT excellence, once again in-person. In between, listeners can learn a few unexpected facts about Tim — has he ever been in a fist fight, starred in a play, or been thrown out of a bar? Listen now to find out...

How WWT Proves the Value of Agentic AIOps with LogicMonitor's Edwin AI

Agentic AI has entered day-to-day operations. Systems with the ability to act, learn, and adjust are already cutting noise, speeding remediation, and giving engineers time back for work that moves the business. In a recent webinar, Karthik SJ, General Manager, AI at LogicMonitor, and Mike Cervasio, Global Practice Manager, AIOps at World Wide Technology, explored what makes this new phase of AIOps actionable.

Amazon Isn't Eating Its Own DNS Dog Food

On October 19-20, 2025, Amazon Web Services (AWS) experienced a significant outage (AWS status) affecting its US-EAST-1 region in northern Virginia. The root cause was DNS resolution failures for DynamoDB’s API endpoints, which cascaded across AWS’s interconnected services, disrupting major platforms including Snapchat, McDonald’s, Disney+, Roblox, Coinbas, Reddit, and Amazon’s own services.

The Hidden Risk of DNS - Lessons from the AWS Outage & Why You Need DNS Spy Monitoring NOW

On October 20, 2025, much of the internet came to a halt. Apps wouldn’t load. Payments failed. Cloud dashboards went dark. From Fortnite to Alexa, Snapchat, and countless business platforms, users across the world were suddenly offline — all because DNS broke inside Amazon Web Services’ (AWS) US-East-1 region.

Detect and map third-party outages with Datadog External Provider Status

Modern applications depend on dozens of external cloud platforms, APIs, and SaaS services to function. But when those providers experience issues, engineers often spend valuable time asking a basic question: Is the problem with us or with them? Provider-maintained status pages are often slow to update, leaving teams waiting for confirmation while incidents escalate. This delay wastes valuable time, prolongs investigations, and risks customer trust.

Optimize HPC jobs and cluster utilization with Datadog

High-performance computing (HPC) environments support some of the most critical workloads in the world—from asset pricing models in financial institutions to molecular simulations in drug discovery. These workloads often span hundreds of thousands of cores, depend on specialized infrastructure such as GPUs, and run for extended periods. As a result, performance and efficiency are critical.

Introducing Updog.ai: Real-time provider status from Datadog

When external SaaS providers or cloud services degrade or go down, engineers often find themselves wondering if the issue they're encountering is local or more widespread. The answers they find are usually slow to surface, limited in detail, or entirely dependent on the provider's updates. Vendor-controlled status pages and third-party aggregators don’t provide the timely, independent visibility that's necessary to quickly and accurately identify the root cause of slowdowns.