Operations | Monitoring | ITSM | DevOps | Cloud

Dark Code: The AI-Generated Software Nobody Understands

The biggest risk to your product isn’t AI-generated code that doesn’t work. It’s generated code that seems fine. AI doesn’t optimize for correctness. It creates something passable. Something that passes the smell test. And when everybody in the industry is pushed to move faster and do more with less, you end up shipping software that looks correct. It passed your quick visual check. It passed all the tests. But no one ever fully understood it.

Beyond AI Vibes: Deterministic Foundations for Agentic Coding

Every week there is another model drop, another agent framework, and another workflow tweak you are supposed to evaluate. Meanwhile, the largest companies, the ones operating at the highest scale and leaning hardest on AI, are also the ones making headlines for reliability strain: capacity limits, outages, and services that buckle under load.

When Your Observability Literally Stops Traffic

Last week, a fleet of autonomous robotaxis in China suddenly stopped working—at scale. Over a hundred vehicles stalled across a city, stranding passengers in traffic and raising immediate concerns about safety, reliability, and trust in autonomous systems. This wasn’t just a bad day for self-driving cars. It was a distributed systems failure, one that happened in the physical world, not just in dashboards.

OpenTelemetry Trace Testing for CI Release Gates

OpenTelemetry is great at answering one question: “what just broke?” The problem is that most teams need a different answer first: “what is about to break in this release?” That is where trace-based testing comes in, especially for teams running a vendor-neutral OTel stack (Collector + Tempo/Jaeger + Prometheus) and needing deterministic release gates.

Why Autonomous AI Agents Can't Run on SaaS Infrastructure

The era of the “copilot” is ending. We are moving rapidly toward the era of the autonomous software factory, where autonomous agents don’t just autocomplete our code—they investigate, plan, test, and merge entire features while we sleep. But this shift has exposed a critical flaw in how we consume AI. For the past decade, the default motion for enterprise software has been SaaS. It’s easy, frictionless, and managed by someone else.

From Datadog to CI Tests: Catch Regressions Before Deploy

I worked in observability for years, and the same pattern showed up across teams. An alert fired, the on-call rotation scrambled, and everyone did what they had to do to stabilize production. Then came the retrospective. Once the immediate pressure was gone, the conversation shifted to one question: how do we make sure this never happens again? My friend Jade Rubick coined a name for that principle: DRI, “don’t repeat the incident”.

AI Coding Agents Break What Works

Your AI coding agent just made every test pass. Ship it, right? Not so fast. A growing class of AI-generated bugs doesn’t come from writing bad code. It comes from the AI changing working code to accommodate its own mistakes. This isn’t a theoretical risk. It’s happening now, in production codebases, and it’s harder to catch than any bug the AI might introduce from scratch.

The 4 Golden Signals of Monitoring Explained

As a team, we have spent many years troubleshooting performance problems in production systems. Applications have become so complex that you need a standard methodology to understand performance. Our approach to this problem is called the Golden Signals. By measuring these signals and paying very close attention to these four key metrics, providers can simplify even the most complex systems into an understandable corpus of services and systems.

The Observability Gap: Why Monitoring Data Should Drive Tests

Most teams already know a lot about production. They have dashboards. They have traces. They have alerts. They have enough telemetry to explain what happened after an incident and enough graphs to argue about it for the rest of the week. Then they go to test a change and start from scratch. The integration tests hit a hand-written mock that returns {"status": "ok"}. The load tests replay a CSV somebody exported months ago. Staging is close enough to production right up until it matters.