Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on DevOps, CI/CD, Automation and related technologies.

Top-Down FinOps: Align Cloud Spend with Real Business Strategy

In this episode of FinOps on Azure, Michael Stephenson sits down with Frank Contrepois, independent FinOps voice and co-host of The FinOps Guys podcast — to explore what it really means to manage cloud costs from a business-first perspective. Frank has been in the FinOps space for nearly a decade and brings a genuinely different angle to the conversation. His background in commodity trading at Strategic Blue (a Morgan Stanley spinoff) shaped how he thinks about reserved instances, commitment strategies, and why most teams approach cost management the wrong way round.

The boring 80% is what kills your backlog

A few weeks ago, we shipped cascading replication for PostgreSQL, MySQL and Redis on Cloud 66. Customers can now build replication chains: a primary streaming to a middle replica, which in turn streams to leaves. It reduces load on the primary, supports geographic distribution, and stops you from melting your network when you have a large fan-out of replicas all pulling WAL from the same machine. PostgreSQL has supported cascading replication natively since version 9.1, which shipped over a decade ago.

No egress fees. No lock-in. That's cloud freedom

With hyperscalers, growth comes with a hidden cost. The more your data moves, the more you pay, by design. Egress fees are that cost. A model built to discourage migration, limit flexibility, and keep you trapped in their ecosystem. At Civo, we've eliminated that barrier completely. No egress fees, no hidden charges. Every cost is transparent and predictable, so you always know exactly what you're paying for. You stay because you choose to. That's cloud freedom.

What is alert fatigue? (And how does it happen)

Alert fatigue doesn’t announce itself. It builds quietly over weeks and months until one day a critical incident triggers and nobody responds with the urgency it deserves. By that point, the damage is already done. This guide walks through what alert fatigue actually is, how it happens, and what you can do about it.

A Guide to 400G Connectivity

Ready to scale beyond 100G? Learn why 400G is on the rise, when to use it, and how to deploy it. Network traffic is growing exponentially. Cloud adoption, AI, large-scale data replication, video streaming, and generative applications are all drivers, and enterprises with traditional connectivity setups may find themselves struggling to keep up. Enter 400-gigabit Ethernet (400G): a high-capacity, scalable networking standard that enables you to build faster and more cost-efficient networks at scale.

What is an ASN? Understanding the backbone of the Internet

Using the internet often feels effortless when clicking a link or joining a call, but behind that simplicity lies a highly structured system that ensures data moves efficiently across the globe. One of the key building blocks of this system is the Autonomous System Number (ASN).

Harness Lives Inside Cursor Now - Plus Everything Else That Shipped in April

April was a big month at Harness. AI is changing how code gets written — and the rest of the SDLC is catching up. In this update, Dewan Ahmed walks through Harness product releases across three themes: AI in the developer workflow, security and governance for AI assets, and self-service maturity for developers and platform teams. What's covered (with timestamps): Found this useful? Subscribe for monthly product updates, and drop a comment telling us which release you want a deep dive on next.

Learn these 4 Chaos Engineering Principles Before You Break Anything | Resilience Testing | Harness

Want to start chaos engineering? Don't randomly break stuff and hope for the best. Real chaos engineering starts with defining your system's steady state metrics like latency, throughput, and error rates. Then you form a clear hypothesis about what should happen when failures occur. Next, you inject controlled failures, starting small with single pod kills or network drops, not production meltdowns. Finally, you limit the blast radius by running experiments in safe environments first.