Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on DevOps, CI/CD, Automation and related technologies.

Deploy on Friday, Ep. 139

It's Friday, which means it's time to deploy! This week we're covering two weeks of news. On the Octopus side, we have new videos on vibe deployments and proving ROI with the Value Metrics Dashboard, a new Kubernetes migration webinar, and more! In the wider ecosystem, Kubernetes 1.36 "Haru" shipped with user namespaces going GA and Ingress-NGINX officially retired. Docker launched microVM sandboxes for AI coding agents. And Google said developer loyalty to AI tools is at zero.

Isolate a User Session in Datadog Synthetics with proxymock

A customer pings support: “I tried to check out twice this morning and got a 500 each time, but it works fine for everyone else.” The session ID is in the email. You have full request/response capture in your environment, you have Datadog Synthetics already running browser checks against the same flow, and you still spend the next two hours grepping logs because none of those tools let you say “show me just this user’s requests, in order, and re-run them.”

Four types of incident alerts every team should know

Not every incident alert needs the same kind of response. One incident may need to wake someone up right away. Another may simply need to be picked up when the team starts work in the morning. Without a clear way to tell them apart, every incident feels equally urgent. That usually adds noise and makes incident response decisions harder than they need to be. This is where two questions help: In this guide, we’ll discuss what those questions mean and the four combinations that follow.

How to Test SQS Workflows Locally with LocalStack and OpenTelemetry

LocalStack lets you run SQS, Lambda, and S3 locally in Docker — but there's a hidden trap: OpenTelemetry's default AWS propagator doesn't work with free LocalStack. Here's how to set up end-to-end local testing with working trace propagation. Prathamesh works as an evangelist at Last9, runs SRE stories - where SRE and DevOps folks share their stories, and maintains o11y.wiki - a glossary of all terms related to observability.

7 best AI deployment platforms for production Kubernetes workloads in 2026

Training a model in a notebook is easy. What breaks teams is the step after, serving it reliably without haemorrhaging cloud budget or burying your SREs in YAML. The common trap: picking a platform that handles the model but not the surrounding stack. An AI deployment platform should orchestrate the full application graph (inference endpoints, vector databases, caching layers, and frontends) inside a single VPC, with GPU autoscaling that doesn't require a dedicated platform engineer to babysit.

#056 - Cloud Contradictions and Cautionary Tales with Corey Quinn (The Duckbill Group)

In this episode of the Kubernetes for Humans podcast, Itiel sits down with the internet's favorite cloud contrarian, Corey Quinn of the Duckbill Group. Corey shares his unconventional career path as a "cautionary tale," explaining why his knack for fixing horrifying AWS bills makes him a terrible employee, and why he absolutely refuses to touch Kubernetes in production.

Context Engineering: How to Manage AI Context at Scale

Context engineering is the practice of managing the information an AI model sees (documents, tool outputs, memory, and structured metadata about the systems it reasons over) so it can make accurate decisions inside a real engineering organization. Most engineering teams have access to the same AI coding agents: Claude, GPT, Gemini, the major variants everyone is shipping. The model is no longer the differentiator.

What happens when you delete everything? Three minutes, or thirty hours.

Last year, at the annual conference for an open source framework you've definitely heard of, I walked up to the founder in a room outside the main stage. He was hunched over his laptop, frantic. We've known each other for a few years. "What's going on? Is everything okay?" He looked up with the specific shade of white people only get when they realize they've made a big mistake.

DORA Metrics in the AI Era: Why Deployment Isn't Faster

DORA metrics in the AI era reveal a paradox: PR volume is climbing, but deployment frequency is staying flat. In this talk, GitKraken's Director of Product Jeff Schinella breaks down why AI-accelerated code generation is creating a review bottleneck that your DORA metrics can't fully explain on their own. Jeff walks through how PR metrics (cycle time, first response time, code churn, and PR size) serve as the leading indicators behind your DORA data. If your deployment frequency is flat while PR counts go up, the bottleneck isn't your devs. It's your review capacity.