Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on DevOps, CI/CD, Automation and related technologies.

Budget Variance In The Cloud Era: Here's How To Turn Surprises Into Business Value

In the traditional finance world, budget variance was a static comparison between actual and budgeted spend. But in the cloud era, where costs scale with usage, experimentation, and engineering decisions, variance tells a much richer story. Done right, budget variance helps you distinguish between healthy growth and margin erosion. It can signal strong feature adoption, rising customer demand, or successful launches. It can also reveal waste, inefficiencies, and weak cost controls.

Is Kubernetes actually HARD? #speedscale #kubernetes #k8s #devops #cloudnative

Thinking about learning Kubernetes in 2026? You’ll need GitOps, kubectl, and CI/CD pipelines... OR you can just use Speedscale. See how a single operator replaces a million dependencies and gives you the traffic insights you actually need to survive production.

Kubernetes is Hard. Here is the "Easy Mode" for 2026

Is Kubernetes actually hard, or are we just using the wrong tools? In 2026, the Kubernetes ecosystem has become a "dependency jungle." Between GitOps, YAML configuration, kubectl mastery, and complex CI/CD pipelines, developers are spending more time managing infrastructure than writing code. In this video, Ken breaks down the "hard parts" of K8s and introduces a more efficient workflow using Speedscale. Learn how to gain instant visibility into your cluster, pull logs without the headache, and turn real-world traffic into actionable load tests.

Recommended Experiments for Production Resilience in Harness Chaos Engineering | Harness Blog

This guide covers battle-tested chaos experiments for Kubernetes, AWS, Azure, and GCP to help you validate production resilience before real failures happen. Start with low blast radius experiments (pod-level) and gradually progress to higher impact scenarios (node/zone failures), always defining clear hypotheses and using probes to measure results. Building reliable distributed systems isn't just about writing good code. It's about understanding how your systems behave when things go wrong.

Guide to Sending Custom Metrics From Your Heroku Application

Heroku makes it easy to deploy and operate applications without managing servers, but understanding how your application behaves internally still requires instrumentation. Platform metrics like CPU usage, memory consumption, and router request/status counts are useful, but they don’t tell you how long your code takes to run, when your app throws errors, or whether users are interacting with key features.

IT Observability in 2026: Lessons From the Past Year

As IT organizations enter 2026, many of the assumptions around monitoring and observability have already been tested. Throughout 2025, infrastructure teams made it clear that visibility alone is not enough. Alerts without context, short data retention, and fragmented tools limited teams’ ability to explain behavior, validate changes, and plan with confidence. This article looks at what emerged from those experiences and how observability expectations continue to shift.

Sending Custom Application Metrics to MetricFire's Hosted Graphite

In this article, we’ll show how easy it is to send custom application metrics directly to MetricFire's public carbon endpoint. We’ll build a small Flask application, emit a handful of practical metrics, and generate local traffic to demonstrate how quickly meaningful data can flow from your code to your dashboards.

Five Ways to Simplify Data Masking | The Tony and Tonie Show Ep 38

5 signs your data masking is fast, secure, and low-maintenance. Can you protect PII, still deliver realistic test data, and design a data masking solution that’s easy to automate and maintain? Tony and Tonie discuss five key traits of a tool that does just that. Read the full article.

Deploy a serverless Python API to Scaleway Functions using CircleCI

Serverless platforms have revolutionized the way developers build and deploy APIs, eliminating the need to manage servers or underlying infrastructure. With serverless, you can focus entirely on your application logic and let the platform handle scaling, availability, and maintenance. Scaleway Serverless Functions is a flexible serverless platform that makes it easy to deploy lightweight APIs and background jobs in the cloud.

When is it ok or not ok to trust AI SRE with your production reliability?

There’s a moment every engineer knows. An AI suggests a fix, it looks reasonable,maybe even obvious, but production is on the line and you hesitate before clicking execute. There’s a big difference between an AI that can recommend an action and one you’re willing to let take that action. All it takes is one bad call, one kubectl command that makes things worse, and suddenly every automated suggestion is a potential liability instead of a help.