Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

DASH 2026 Keynote

At, Datadog launched 100+ capabilities to help customers drive autonomy and manage growing AI and security complexity. From new Bits AI, log management, and security capabilities, customers have the visibility and autonomous operations they need to detect, investigate and resolve issues across the development loop and data lifecycle. Tune in to the full keynote to catch the highlights.

The Real Cost of Custom Code: Why Buying a Unified Middleware Management Platform Protects Enterprise IT Budgets

Building custom middleware monitoring appears cost-effective but creates expensive maintenance debt, fragmented visibility, and operational risk. Enterprise teams spend 60-80% of IT budgets on software maintenance while unified platforms deliver immediate, production-ready capabilities.

Why Your Vendor Monitoring Strategy Has a Blind Spot: The Case for Continuous TPRM

You monitor everything. Network traffic, application performance, authentication events, infrastructure health. If something meaningful changes in your environment, you have a signal for it. That discipline is foundational to how modern IT and security operations work. But there is one part of your stack you almost certainly cannot see in real time: your vendors.

Time to move to the StatusGator v3 API: What v2 users need to know

We launched the StatusGator v3 REST API back in October, and it has only gotten better since. v3 is a ground-up redesign built around organization-level API tokens, a consistent response format, opaque string IDs, pagination, and a large set of write endpoints for managing monitors, incidents, and subscribers. We have kept shipping new capabilities for it, and we will keep doing so. v2, on the other hand, is done.

How to Size Infrastructure When Hardware Delays and Cost Pressure Change the Equation

Sizing infrastructure has always required a balance between performance, capacity, and risk. What has changed is the level of precision required to make those decisions. Hardware timelines are less predictable. Costs are under closer review. Decisions that were once routine now require clear justification. In many cases, the question is no longer just how much capacity is needed, but whether that capacity can be delivered when it is needed and whether the investment will hold up under scrutiny.

Monitor Memory Where Allocations Occur

Kubernetes dashboards often mask a system infrastructure failure. When a critical application crashes, it often points to an Out-of-Memory event. Even while standard CPU metrics appear completely healthy. This quick walkthrough shows you how Coralogix integrates continuous memory profiling directly into your production environment. We pair OpenTelemetry trace data with continuous background sampling via the Async Profiler. It helps teams isolate resource heavy code paths before they trigger system degradation.

Turn Datadog findings into automated code fixes with Bits Code

Engineering teams lose hours in the gap between detecting a problem and getting a fix into review. An on-call engineer sees an error spike in Datadog, pivots to traces and logs to isolate the failure, opens the relevant repository, reproduces the issue, writes a fix, adds tests, waits on CI, and finally opens a pull request. Even when the problem is familiar, the workflow pulls engineers across several tools and stretches remediation from minutes into hours or days.

DASH 2026 Operating at Scale: Guide to Datadog's newest announcements

A challenge for many teams continues to be managing cost, governance, and reliability across an ever-larger footprint. This year’s DASH announcements help teams operate efficiently at scale, with new tools to cut cloud and AI spend, eliminate waste automatically, maintain observability during outages, and manage many organizations and agents as a single unit.