Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on DevOps, CI/CD, Automation and related technologies.

DORA Metrics in the AI Era: Why Deployment Isn't Faster

DORA metrics in the AI era reveal a paradox: PR volume is climbing, but deployment frequency is staying flat. In this talk, GitKraken's Director of Product Jeff Schinella breaks down why AI-accelerated code generation is creating a review bottleneck that your DORA metrics can't fully explain on their own. Jeff walks through how PR metrics (cycle time, first response time, code churn, and PR size) serve as the leading indicators behind your DORA data. If your deployment frequency is flat while PR counts go up, the bottleneck isn't your devs. It's your review capacity.

Rightsizing Nightmares: When Your Cloud Cost Tool Degrades Performance

This is what production teams see happening. A vertical pod autoscaler recommendation gets applied automatically. Resource requests come down a notch across a namespace. The cost dashboard registers a small cost savings win. A few minutes later, health checks start failing. Pods enter crash loops.

The cloud optionality blueprint: standardizing the stack to end vendor lock-in

Key takeaway: Real cloud strategy isn't about running the same workload everywhere at once; it’s about the freedom to move when you need to. By standardizing the unified configuration file, Upsun enables true cloud optionality, moving provider migration from a re-architect project to a data move project.

How to run a proof of concept that de-risks your monitoring decision

Part 3, key insights from a fireside chat with Chris Yates. Read part 1 here, and part 2 here. Most database monitoring proof of concepts (POCs) answer the wrong questions. Here's how to structure a proof of concept that genuinely de-risks your vendor decision with the questions to ask during the process. A POC is often treated as the final hurdle in vendor evaluation, but too often, it becomes theatre. A guided tour of the flashiest features, run by one person, under unrealistic conditions.

Building for Resilience: An Engineering Guide to the Mythos Era | Harness Blog

The release of Anthropic Mythos and Project Glasswing marks an exciting and pivotal new chapter in software development. As the industry advances, the speed and economics of vulnerability exploitation have fundamentally shifted. What once took weeks of manual reconnaissance can now be scaled rapidly through automated models. However, this is not just a security problem to solve. It is a massive engineering opportunity to build cleaner, more robust systems.

Infrastructure as Code Management: Terragrunt & Multi-IaC | Harness Blog

What happens when your Infrastructure as Code management strategy works perfectly in dev, scales reasonably well in staging, and then quietly fractures across seventeen production workspaces because nobody documented which Terragrunt wrapper goes with which AWS account? You spend Friday afternoon reverse-engineering DRY patterns that made sense six months ago, wondering why your team is managing three different IaC execution engines with four incompatible workflow philosophies.

Five questions your platform evaluation is missing

Years back I sat in on a platform evaluation with a customer who spent forty-five minutes of the meeting focusing on one thing: their custom PHP content management system. They had opinions about the CMS. Strong opinions. They had benchmarks, a migration plan, a proof of concept. They had a diagram. They had questions about the deployment pipeline for this CMS that were, for a single application, more thoroughly considered than most organizations' entire infrastructure strategies.

Why do you need incident alerting? (And why monitoring alone isn't enough)

Monitoring tools track what’s happening across your systems and send a Slack message or email when something looks off. But they don’t call anyone and they don’t escalate the incident. If that Slack message goes unseen at 3 AM on a Saturday, the incident just sits there until someone opens their dashboard. Incident alerting fills this gap. When an incident triggers, it contacts the right person directly through a phone call or their preferred channel.