Operations | Monitoring | ITSM | DevOps | Cloud

Birol Yildiz on Autonomous Incident Response and the Future of AI SRE | Harness Blog

At SREday NYC 2026, the ShipTalk podcast welcomed Birol Yildiz, Co-founder and CEO of ilert, for a conversation about the next evolution of incident response. In the episode, ShipTalk host Dewan Ahmed, Principal Developer Advocate at Harness, spoke with Birol about how artificial intelligence is transforming reliability engineering—from simply assisting engineers during incidents to autonomously diagnosing and resolving outages.

Observability Lessons From OpenAI

Writing code is moving from the good old IDE into the realm of autonomous AI agents. One example of this is OpenAI, which has been developing internally with 0 lines of manually written code. You can read about their workflow in their engineering blog: Harness engineering: leveraging Codex in an agent-first world. For me, the main takeaway of OpenAI’s article is how AI has rewritten the constraints equation.

70% to 90% of AI Projects FAIL. Here's Why.

Why are so many modern AI initiatives falling short of their ROI? In this episode of iOPEX, Malcolm Lett (Technical Lead) breaks down the critical mistakes companies make when implementing AI and how to choose the right tools for real success. Most organizations treat Generative AI as a "one-size-fits-all" solution, but it’s only one piece of the puzzle. Malcolm explores the four essential domains you need to balance to build a winning strategy.

How Vibe Coding A Self-Help App Made Me An AI Believer

For longer than I’m proud of, I was an AI skeptic. Then, over the holidays, I vibe coded an app whose sole purpose was to make me a better person. The app is a motivator. It’s programmed to send me timely reminders along certain themes, like reading every day, making healthy eating choices, and giving myself plenty of time to plan for anniversaries and birthdays.

NVIDIA's Jensen Huang just described your next big cost problem

On March 18, Jensen Huang took the stage at NVIDIA’s GTC conference in San Jose for a keynote that ran well over two hours — covering everything from CUDA’s 20-year history to humanoid robots that may one day wander Disneyland. But buried inside the spectacle was a remarkably clear-eyed articulation of the economic forces now bearing down on every enterprise that builds on cloud infrastructure.

Annotate traces to improve LLM quality with Datadog LLM Observability

LLM applications rarely crash. They degrade quietly. Once these applications are shipped to production, subtle quality failures become harder to catch with traditional signals. Tone shifts, hallucinated details, off-topic responses, and incomplete reasoning can emerge while latency and token usage look stable.

Why AI Driven Automation Can't Wait

Operators today are navigating unprecedented complexity—rising costs, accelerating customer expectations, and increasingly dynamic networks. In this recent video interview, my colleague Kevin Wade and I explore why AI‑driven automation has shifted from a “nice‑to‑have” technology to a core business requirement for telecom operators and beyond.

How OpenRouter and Grafana Cloud bring observability to LLM-powered applications

Chris Watts is Head of Enterprise Engineering at OpenRouter, building infrastructure for AI applications. Previously at Amazon and a startup founder. As large language models become core infrastructure for more and more applications, teams are discovering a familiar challenge in a new context: you can't improve what you can't see.

Introducing Calico Load Balancer and Seamless VM-to-Kubernetes Migration

SAN JOSE, Calif., March 23, 2026 — Tigera, the creator and maintainer of Project Calico, today announced a major expansion of its Unified Network Security Platform for Kubernetes, aimed at helping enterprises consolidate infrastructure and accelerate the migration of legacy workloads to cloud-native platforms.