Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Cloud monitoring, security and related technologies.

AWS Outage History: What Engineering Teams Should Learn

If you've been running production workloads on AWS for more than a year, you've felt it: the 3 am PagerDuty alert, the scramble to check the AWS console, the frantic Slack thread asking, "Is this us or is this AWS?" And then, minutes or hours later, the AWS Service Health Dashboard finally acknowledges what your users have been experiencing all along. It happens because AWS is the backbone of modern infrastructure.

What Is LLM Observability? For CFOs And Engineers, The Missing Layer Is Cost

You probably have Datadog. Maybe New Relic, maybe Dynatrace. Your observability stack has been solid for years — and you're still flying blind on AI cost. Here's why LLM observability needs a fourth pillar most tools skip, and how to build one that actually tells you what your models are costing you per request, per feature, per customer.

Blind Tokenmaxxing Is The New Cloud Waste. Focus on Outcome-Maxxing Instead

Meta's internal token leaderboard sparked a frenzy — and a reckoning. Tokenmaxxing without attribution is just cloud waste 2.0. Companies like Hudl and Duolingo use cost intelligence to connect every AI dollar to a business outcome.

Beyond the Big Bang: De-risking Cloud Migrations with Progressive Delivery | Harness Blog

At 2 am, your migration goes live. By 2:07, error rates spike, and rollback isn’t an option. Cloud migrations, API rewrites, and architecture transformations rarely fail because of bad code. They fail because of how that code is released. Most teams still rely on a “big bang” cutover where infrastructure, services, and user-facing changes go live at once. This concentrates risk into a single moment.

Anything but that cloud

"Anything but that cloud." I asked why. "Our biggest customer is a giant retailer," he said. "That hyperscaler's parent company is the retailer's biggest competitor. So our customer refuses to do business with anyone who uses that cloud. We use that cloud, we lose our biggest customer. Full stop." That was the entire conversation about cloud choice. It wasn't a technical preference. It wasn't a pricing optimization. It wasn't a sovereignty concern.

How to Align CloudOps and FinOps for Better Azure Cost Management

The rapid migration to the cloud has brought unprecedented agility to modern enterprises, but it has also introduced a significant challenge in the form of cloud sprawl. As engineering teams provision resources at breakneck speed to support new applications and AI-driven workloads, financial departments often struggle to keep track of the escalating costs. This disconnect between operational execution and financial oversight is a primary driver of wasted cloud spend. To truly harness the power of scalable infrastructure without breaking the budget, organisations must bridge the gap between CloudOps and FinOps. Aligning these two disciplines ensures that technical performance and financial accountability work hand in hand to deliver sustainable business value. For companies heavily invested in Microsoft ecosystems, this alignment is even more crucial. Unchecked deployment can lead to massive end-of-month bill surprises, turning what should be a strategic advantage into a financial burden.

The Regional Data Centre Revolution Powered by AI Demand

London still hosts the biggest concentration of UK data centre capacity, but the centre of gravity is starting to move. AI workloads are changing the infrastructure maths, pushing power, space and planning considerations up the decision list. That is exactly where regional locations start to look like the sensible option. Government data shows how concentrated the market remains: as of autumn 2024, London is estimated at 1,048MW of colocation IT load. Compare that with 44MW in the East of England, 17MW in the North East and 30MW in Scotland. The gap is huge, yet it is not a permanent advantage.