Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on DevOps, CI/CD, Automation and related technologies.

IPAM Site Mapping: Give Your Subnets a Home

Without site context, your 5-minute fix becomes a 30-minute hunt through spreadsheets and Slack channels while users wait. This isn’t just inconvenient—it’s expensive. Every minute of downtime costs your business, and every minute spent playing IP detective is a minute not spent solving the actual problem. As networks scale across cloud, hybrid, and on-premises environments, this lack of infrastructure context creates real operational pain for your team.

Cut Compute Costs Up To 90% With Azure Spot Instances

When cloud costs spike, compute is often the culprit. Using Azure Spot Instances could cut your compute costs by up to 90%. But Spot VMs come with trade-offs, including unpredictable evictions and capacity constraints. And that makes them tricky to use without the right strategy and visibility. In this guide, we will share how to make them work for you.

We built an MCP server so Claude can access your incidents

"Show me all critical incidents from the last week." "Create an incident for the payment API being down." "What was the root cause of that database incident last Tuesday?" If you've ever wished you could just ask Claude (or any MCP client) to handle incident management tasks instead of context-switching between chat and your incident management dashboard, you're going to like what we built.

What are Application Metrics?

Application metrics are structured, quantifiable signals that reflect how your software behaves in production. They capture key aspects of performance, response times, error rates, throughput, and resource usage, giving you a real-time view into the health of your system. Tracking the right metrics helps detect regressions early, surface latent issues before they impact users, and guide optimization decisions based on hard data, not guesswork.

You're Writing Code Wrong: Start Telling Better Stories with Git

What if your Git history could read like a great novel? In this talk from GitKon, Jason Gates (Senior Staff at Sandia National Labs) makes the case that software is storytelling...and Git is your medium. With references from The Hobbit to The Stormlight Archive, he shows how commit structure, messaging, and PR flow aren’t just best practices, they’re tools to help your team (and future you) understand what really happened.

EMEA Rundeck by PagerDuty Meetup - July 2025

Join us for an informal 1-hour virtual event where the open-source Rundeck by PagerDuty community comes together to share automation stories and use cases. Whether you're new to Rundeck or looking to elevate your automation game, this meetup is packed with valuable takeaways for everyone! Host: Martin Van Son, Automation Specialist & Strategic Solution Advisor at PagerDuty New OSS Dashboards & Enterprise ROI Plugin + Creating Rundeck Plugins with Claude Code.

AMER Rundeck by PagerDuty Meetup - July 2025

Join us for an informal 1-hour virtual event where the open-source Rundeck by PagerDuty community comes together to share automation stories and use cases. Whether you're new to Rundeck or looking to elevate your automation game, this meetup is packed with valuable takeaways for everyone! Host: Forrest Evans (Director, Product Management at PagerDuty) Rundeck by PagerDuty: A Swiss Army Knife of Automation.

Save Hours on Troubleshooting with Automated Investigations

How many times has your team stared at a dashboard, pointed to a spike, and asked a question that charts alone can’t answer? “What was the real impact of that deployment?” “Why are our Kubernetes pods in the us-east-1 cluster suddenly crashing?” “Are we wasting money on overprovisioned servers?” Answering these questions is the real work of operations and SRE.

Tutorial: How to Remediate Vulnerabilities with Puppet Enterprise Advanced Patching

The rate at which vulnerabilities are being exploited is on the rise. The VulnCheck company, which specializes in vulnerability intelligence, found that in Q1 2025, 28.3% of vulnerabilities were exploited within 1 day of CVE disclosure. Keeping your systems up to date is more important than ever. The reality is that many security teams are running scans and then exporting to giant spreadsheets, which are “tossed over the wall” to the Operations team with little context.

Product Klip: Istio Developer Dashboard

Troubleshooting issues in a complex service mesh environment, such as traffic failures or authorization problems, often requires the expertise of an SRE or DevOps professional. However, Komodor simplifies this process. Komodor provides developers with the necessary visibility to diagnose service mesh issues on their own. It helps developers easily identify blocked connections and understand the root cause without having to review logs or configuration files.