Operations | Monitoring | ITSM | DevOps | Cloud

Runbooks are history: Why agentic AI will redefine incident response forever

If you’re an SRE, platform engineer, or on-call responder, you don’t need another article explaining incident pain. You feel it every time your phone lights up in the middle of the night. You already know the pattern: You’ve invested in runbooks, automation, observability, and “best practices,” yet incident response still feels like firefighting. Now imagine the same midnight page, but with AI SRE in place: What once took hours is now finished in a couple of minutes.

Drive business outcomes with Unit Economics in Datadog Cloud Cost Management

See how Datadog turns cloud usage and performance data into actionable business insights by helping teams calculate unit economics to measure and optimize the efficiency of every service. You’ll discover how to: Datadog bridges the gap between cloud costs and business value—helping organizations get the most value out of their cloud investment.

Load Testing Kafka #speedscale #kafka #loadtesting

Message brokers are a critical component of modern distributed systems, facilitating asynchronous communication between services. Load testing message broker integrations requires special considerations since the interaction patterns differ from traditional HTTP-based APIs. Speedscale provides specialized tooling to help you load test applications that integrate with message brokers by.

CTO Predictions for 2026: Special ShipTalk Episode with Nick Durkin

AI will not fix broken software delivery. It will expose it. By 2026, teams that win will use specialist AI agents, guardrails over gates, and security built directly into the pipeline. As we look toward 2026, it is becoming clear that AI is not just changing how code is written. It is changing how software delivery itself works. The real shift is happening at the intersection of AI, security, and developer experience, where speed, risk, and responsibility now collide.

How AI-Native Data Pipelines Help Create a Security Data Lake

Security teams are generating and storing more telemetry than ever before. Logs, metrics, traces, and events come from cloud services, applications, identities, and infrastructure across many environments. Retention requirements continue to grow, yet the cost of storing all of this data in traditional hot storage can quickly exceed annual budgets. At the same time, investigations and audits rely on fast access to historical data, and any delay can slow response time or limit visibility.

Get Kafka-Nated Special Episode: A Kristmas Kafka

Join us for A Kristmas Kafka, an informal and deeply technical roundtable with Apache Kafka committers, contributors and community leaders. This conversation brings together the people closest to the Kafka codebase to reflect on where the project started, how it has evolved and what lies ahead for streaming systems.

Part 3: What If IT Stopped Reacting to Incidents and Started Predicting Them?

Enterprises are experiencing a turning point. Systems scale faster than teams can, AI is rewriting the rhythms of operations, and the cost of downtime grows heavier every quarter. In this new landscape, reacting is no longer enough. Teams need foresight. They need to get ahead of the issue. They need a different model entirely. This third installment centers on a simple but transformative idea. What if IT operations could finally step out of reaction mode and move into anticipation?