Operations | Monitoring | ITSM | DevOps | Cloud

Building Trust in the Machine: A Guide to Architecting Agentic AI for SRE

The promise of Artificial Intelligence in Site Reliability Engineering (SRE) is seductive: an autonomous system that never sleeps, instantly detects anomalies, and fixes broken infrastructure while humans focus on high-value work. However, the gap between a demo-ready chatbot and a production-grade Autonomous AI SRE is vast. In complex, noisy environments like Kubernetes, a “naive” implementation of Large Language Models (LLMs) is not just ineffective, it can be dangerous.

AI Tags: Why Cloud Tagging Breaks Down For AI Workloads (And What To Use Instead)

Tags have long been the backbone of cloud cost visibility and governance. They help teams understand who owns what, where spend comes from, and how infrastructure maps back to the value the business delivers. However, AI workloads have altered that model, and exposed the limitations of traditional AI tags in the process. In fact, many of the most expensive AI operations don’t run on taggable cloud resources at all.

AI meets SQL Server 2025 on Ubuntu

Since 2016, when Microsoft announced its intention to make Linux a first class citizen in its ecosystem, Canonical and Microsoft have been working hand in hand to make that vision a reality. Ubuntu was among the first distributions to support the preview of SQL Server on Linux. Ubuntu was the first distribution offered in the launch of Windows Subsystem for Linux (WSL), and it remains the default to this day. Ubuntu was also the first Linux distribution to support Azure’s Confidential VMs.

Observing agentic AI workflows with Grafana Cloud, OpenTelemetry, and the OpenAI Agents SDK

As agentic AI applications are used more broadly in production, they introduce new operational models, combining multi-step reasoning, tool execution, and autonomous decision-making into a single workflow. SRE teams need visibility into how these agents behave, where they fail, and how they perform over time.

The Dangerous Power of Local AI Agents. #speedscale #proxymock #aiagents #openclaw #localai

I’ve been testing OpenClaw, a fully autonomous agent that lets you remote control your entire system via Signal. It’s incredibly powerful to text your computer from a coffee shop and have it execute tasks, but you’re essentially handing the keys to your digital kingdom to an LLM. The Golden Rule: Trust, but verify. I’m using Proxymock to sniff every single API call going in and out of the agent. If there’s a data leak or a "hallucination" that tries to wipe my drive, I see it first.

Qwiet AI Is Now Harness SAST and SCA | Harness Blog

Modern application security is struggling to keep up with AI-driven development and cloud-native scale, especially when security feels bolted onto CI/CD instead of built in. Harness SAST and SCA bring AI-powered application security testing natively into the Harness platform, reducing noise and alert fatigue. By identifying only vulnerabilities that are actually reachable in production code, teams get findings they can trust and act on faster.

Protect agentic AI applications with Datadog AI Guard

Organizations are increasingly using agentic AI applications powered by large language models (LLMs) to automate analysis, decision-making, and operational workflows. As these AI agents take on more responsibility, they gain access to internal tools and services and can interact with them in unintended ways.

Tool Consolidation Is Dead. Long Live Agentic AI.

It’s 2026, and developers have more tools at their disposal than at any point in the industry’s history: CI/CD platforms are richer; observability stacks are deeper; security, data, and AI tooling have exploded into crowded, competitive ecosystems. And yet, delivery is still slow, incidents are still noisy, workflows are still brittle. The problem is no longer tool scarcity or feature depth. It’s integration debt.

8 themes shaping engineering in the age of AI

We know that AI has been transformational for engineering and it will continue to be, so stop me if this sounds familiar. Imagine an engineering lead opening a pull request for a critical security patch and finding five hundred lines of AI-generated code. While the solution is (mostly) usable, it follows a pattern no one on the team recognizes. This shift away from manually writing every line of logic has introduced a unique level of complexity for teams.