Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on DevOps, CI/CD, Automation and related technologies.

Stop Vibe Coding Everything: The Case for Spec-Driven Dev

Spec-driven development with AI coding agents could change how you build software. In this GitKon 2025 talk, Erik Hanchett, Senior Developer Advocate at AWS, breaks down why AI coding assistants perform dramatically better when they start with structured specifications instead of raw prompts. If you've been vibe coding your way through complex features and wondering why your AI keeps going off the rails, this is the video for you.

Millions of Metrics. Zero Clarity.

Millions of metrics. Zero clarity. That’s the reality many IT teams are facing today. As environments grow more complex, telemetry explodes. Millions of records generated every hour. Dozens of specialized tools for network, storage, Kubernetes, cloud, AI workloads. Each tool is good at its domain. But none of them answers the real question: Where should I focus right now? Fragmented visibility creates predictable failure modes.

Keeping it boring: the incident.io technology stack

At incident.io we run a deliberately simple technology stack. Keeping things boring has allowed us to scale from a few hundred customers to several thousand, while having only two platform engineers. In this post I'll walk through the stack, explain some of the choices we've made, and touch on the challenges we're facing as we grow.

Ghosts of Servers Past: The Bare-Metal Comeback Story

Bare-metal. Just reading that word might trigger a physical reaction for some of us. Dusty closets, old server rooms, and loud rigs that never seemed to work quite right. Remember waiting days for IT to provision a server, only to realize your ticket got lost in the shuffle? Or the classic "well, it worked on my machine" excuse right before a production push? Ah, the good old days.

What is an escalation policy? (And why every team needs one)

An escalation policy is the route an incident takes after it triggers. It lays out who gets alerted first and sets a wait time. If nobody responds, it moves the incident forward to the next person. The word “escalation” is worth pausing on. When an incident triggers and the first person doesn’t respond, the incident doesn’t sit and wait. It moves to the next person and keeps moving until someone picks it up. That forward movement is the escalation.

[Webinar] Conquering the Complexity of Self-Hosted Apps with Agentic AI SRE

Most enterprise SaaS products, like Komodor’s Autonomous AI SRE Platform, require installing a remote agent on the customer’s infrastructure, which varies significantly from one organization to another, in terms of architecture, configurations, permissions, processes, and more. This “unmanaged” model creates major blind spots, making daily operations, observability, debugging, and incident response challenging. When failures occur, limited visibility and bespoke systems make root-cause analysis slow, incomplete, or impossible.

A compass for designing your escalation policy

The first time you sit down to design an escalation policy, it can feel a little like a crossroads. You know incidents need to reach the right people. You just aren’t sure which structure makes the most sense. Should you route by severity? By who’s available? Or by team? There’s no single right answer. Think of this guide as a compass. A compass doesn’t tell you exactly where to go. It helps you orient yourself based on where you already are.

The benefits of leadership coaching in the tech industry - with Cindy Gross

Steve is joined by Cindy Gross to discuss leadership coaching in the technology industry – what it is, how it works, and the many benefits it brings. Recorded on-site at PASS Data Community Summit 2025 in Seattle. Cindy is an executive coach and Adaptive Leadership Expert with 25 years in the US tech industry. As a former SQL Server Master (MCM) and expert in cross-organizational navigation at Microsoft, she transformed her focus from complex technical systems to empowering leaders within systemic chaos.

Harness AI February 2026 Updates: Securing & Making the SDLC Reliable and Shipping Faster with Agents | Harness Blog

February is all about making AI in software delivery secure and easier to operate at scale. This month’s updates span enterprise-grade application security, API security via MCP, SRE automation, and a major upgrade to the DevOps Agent.

Azure Tagging In 2026: A Complete Guide to Organizing Resources, Costs, and Governance

Azure tags are like sticky notes for your cloud resources. They help you label and organize infrastructure in ways that make sense to your organization. Tags enable you to assign categories to resources, making it easy to group, monitor, track, and filter them across any environment. So, how do tags and tagging work in Azure?