Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on DevOps, CI/CD, Automation and related technologies.

Being on-call at incident.io

At incident.io, we are building a product that our users rely on 24/7, all year round. This means it is crucial that it is always working, and that is where our on-call rotation comes in. We believe that everyone should be on-call because it tightens the feedback loop between shipping new features and maintaining what we have, leading to more pragmatic engineering decisions.

Release v2.6: MCP Server, AI Insights Enhancement, Okta SCIM Integration, SNMP Monitoring and more.

Netdata 2.6.0 is here and it’s our most intelligent release yet! This version brings AI-powered monitoring, easier network visibility, and smoother enterprise integrations, all designed to help you troubleshoot faster and scale smarter. What's New: Netdata Referral Program Every referred user will get a 10% discount when they subscribe to Netdata Business or Homelab - and you will receive 10% of their subscription value (up to a max of 1000$ per space). You can refer an unlimited number of users, so there's no real limit to how much you can earn with the referral program.

Let Git Find the Bug for You (No Guessing)

Somewhere in your commit history, a bug snuck in. You could scroll. Panic. Guess. Or — you could let Git find the exact commit that broke your code. In this episode of Wait… Git Can Do That?, we show you how git bisect binary-searches your history to isolate the problem — fast, clean, and testable. Use git bisect start, good, and bad Test each step to narrow it down Or automate it with git bisect run.

How to ensure your AWS workloads are resilient

Part of the Gremlin Office Hours series: A monthly deep dive with Gremlin experts. Cloud providers like AWS give you plenty of tools to make your workloads more resilient, but it’s up to you to apply them. However, considering how complex some of these tools are, where do you start? And how can you be sure your systems are more reliable as a result?

Kubernetes Clusters Break in the Weirdest Ways

If you’ve ever spent hours chasing a weird issue in your Kubernetes cluster, you’re in good company. Reddit’s r/kubernetes is full of hilarious and painful stories about clusters going off the rails for reasons no monitoring dashboard ever predicted. And while it’s easy to laugh after the fact, each of these moments highlights just how important observability is because these kinds of problems don’t show up on your radar until it’s too late.

Looking beyond dev productivity to increase speed ft. Brian Guthrie of Justworks

Speed isn't just about developer productivity—it's about market dominance. Rob sits down with Brian Guthrie, Director of Engineering at Justworks and former ThoughtWorks consultant, to explore why lead time from conception to production should be your organization's north star metric.

Fix Vulnerabilities Faster: Puppet's Advanced Patching Solution

Break down patching silos and remediate vulnerabilities faster with Puppet. Most CVEs sit unaddressed for weeks, even after your scanner picks them up. Vulnerability Remediation in Advanced Patching (a Puppet Enterprise Advanced exclusive) gives Security and Ops teams an easy-to-use dashboard for finding, fixing, and reporting on vulnerabilities. No more tossing CVEs over the fence. No more finger-pointing when things go wrong. Just swift, efficient vulnerability management.

Why Branding Still Matters in the Age of DevOps and SaaS Automation

In a landscape dominated by automation, CI/CD pipelines, and observability dashboards, branding can feel... secondary. After all, if your platform ships fast, scales reliably, and integrates with everything - why should anyone care what it looks or sounds like? The answer is simple: they do.

Stop Losing Your Git Stash With This Easy Trick!

Got 12 unnamed stashes and no idea what’s in any of them? In this episode of Wait… Git Can Do That?, we show you how to list and pop a specific stash entry using stash@{n}. You’ll learn how to: Orient yourself with git stash list Pop a targeted stash with stash@{2} Keep it around using apply instead of pop No more mystery stashing. Just clean, precise Git workflows. Subscribe for more ways to make Git suck less.

2025 Guide & Template: Automating Production Readiness

When launches are delayed or incidents occur, it’s often due to a breakdown in production readiness. Maybe documentation is outdated. Maybe no one’s on-call. Maybe a critical dependency isn’t even known. The truth is, production readiness shouldn’t be a manual checklist. Production readiness needs to be as dynamic as the software being evaluated.