Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Cloud monitoring, security and related technologies.

Why developer teams are rethinking their cloud provider this year

The default cloud choice for technically literate teams has shifted. It hasn't shifted dramatically; the major hyperscalers aren't going anywhere, and their enterprise position is still strong, but the conversation that used to start with "which hyperscaler" now genuinely starts with "what do we actually need." That's new.

Shipped: You're emitting AI telemetry. Point it at an engine that turns it into allocated spend.

Your AI calls already emit OpenTelemetry: your LLM gateway exports it, and it’s the open standard your own services can speak. But you don’t have anywhere to turn those spans into spend you can allocate to an outcome. Now you can. CloudZero exposes an OpenTelemetry endpoint that doesn’t care what’s on the other end.

How to monitor and optimize GPU utilization in the cloud

GPU utilization is one of the most expensive metrics in cloud infrastructure to get wrong. A GPU running at 30% utilization costs the same as one running at 90%, but it's doing a third of the useful work. For workloads measured in tens of thousands of GPU-hours, the difference between average utilization in the 30s and average utilization in the 70s is hundreds of thousands of dollars across the life of the workload.

Building More Resilient Multi-Cloud Operations

The last post in this series looked at how disconnected alerts can slow incident response and how stronger correlation helps teams investigate issues with more clarity. That same operational context has value beyond triage. It also plays an important role in resilience, service assurance, and the ability to maintain confidence across increasingly complex multi-cloud environments. Resilience depends on more than reacting well during an outage.

Shipped: What did the feature cost to ship? What does this customer cost to serve?

You can already split AI spend by team and by model. But that’s not what your CEO asks in the QBR. The question is what you got for it: what did it cost to ship that feature, to launch that campaign, to serve that customer. And is the AI bet behind it paying off? Now you can allocate AI spend to the outcomes you own: customer, product, feature, the strategic bet on the P&L. Not just the team that spent it.

The next era of telco clouds: get open infrastructure choice with Sylva and Canonical Kubernetes

The telco industry is undergoing a fundamental change. Over the past few years, the increasing maturity of cloud-native infrastructure has accelerated the movement from manually operated and hardware-centric systems to automated, software-defined platforms. Underpinning this change are open source initiatives such as the Sylva project. Sylva is hosted by Linux Foundation Europe and heavily backed by major telecom operators and vendors.

How Cloud Computing Is Revolutionizing Prop Firm Technology

The financial trading world has changed dramatically over the past decade, and much of that change has been driven by one thing: cloud computing. For proprietary trading firms, staying competitive means being faster, smarter, and more reliable than ever before. That is where prop firm technology comes in.

The Two-Sided Scheduling Problem: Reaching the Next Layer of Cloud Savings

You’ve deployed Karpenter or Cluster Autoscaler and tightened your resource requests, but while you saw an initial dip in your cloud bill, your savings have flatlined. Organizations that thought they had the fundamentals of cloud cost under control are now seeing stagnation. The problem isn’t that they need another FinOps tool or better visibility. The problem is that the current state of enterprise cloud cost optimization strategy is fundamentally reactive.

The Inference Paradox: How Split-Brain LLMs Are Killing Your GPU ROI

During the Toronto KCD (Kubernetes Community Days), I attended an insightful talk on AI resource optimization that highlighted a staggering Gartner study: “AI infrastructure is adding $401 billion in new spending this year alone. Yet, real-world audits tell a much darker story, revealing that average GPU utilization in the enterprise is stuck at a dismal 5%”. While many people in the audience were shocked by that number, the data didn’t come as a surprise to us.