Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on DevOps, CI/CD, Automation and related technologies.

Nagios Plugins Collector: Run Your Existing Checks and Custom Scripts Inside Netdata

A lot of teams have a collection of Nagios plugins and custom monitoring scripts that have been running reliably for years. Some are standard community plugins for checking disk health or SSL certificate expiry. Others are homegrown Bash or Python scripts that check something very specific to the business: whether an API endpoint returns the right payload, whether a batch job completed on time, whether a queue depth is within bounds.

Under the Hood: Engineering JFrog Premium Availability

In the modern software factory, 99.9% uptime is no longer the gold standard. A standard 99.9% SLA translates to approximately 43 minutes of unexpected downtime per month. While industry data shows that a single minute of downtime costs an average of $9,000, for large global enterprises, that figure can easily be 5x higher. At tens of thousands of dollars per minute, those 43 minutes quickly compound into a catastrophic financial and operational risk.

What Is LLM Observability? For CFOs And Engineers, The Missing Layer Is Cost

You probably have Datadog. Maybe New Relic, maybe Dynatrace. Your observability stack has been solid for years — and you're still flying blind on AI cost. Here's why LLM observability needs a fourth pillar most tools skip, and how to build one that actually tells you what your models are costing you per request, per feature, per customer.

Blind Tokenmaxxing Is The New Cloud Waste. Focus on Outcome-Maxxing Instead

Meta's internal token leaderboard sparked a frenzy — and a reckoning. Tokenmaxxing without attribution is just cloud waste 2.0. Companies like Hudl and Duolingo use cost intelligence to connect every AI dollar to a business outcome.

How to Align CloudOps and FinOps for Better Azure Cost Management

The rapid migration to the cloud has brought unprecedented agility to modern enterprises, but it has also introduced a significant challenge in the form of cloud sprawl. As engineering teams provision resources at breakneck speed to support new applications and AI-driven workloads, financial departments often struggle to keep track of the escalating costs. This disconnect between operational execution and financial oversight is a primary driver of wasted cloud spend. To truly harness the power of scalable infrastructure without breaking the budget, organisations must bridge the gap between CloudOps and FinOps. Aligning these two disciplines ensures that technical performance and financial accountability work hand in hand to deliver sustainable business value. For companies heavily invested in Microsoft ecosystems, this alignment is even more crucial. Unchecked deployment can lead to massive end-of-month bill surprises, turning what should be a strategic advantage into a financial burden.

The Regional Data Centre Revolution Powered by AI Demand

London still hosts the biggest concentration of UK data centre capacity, but the centre of gravity is starting to move. AI workloads are changing the infrastructure maths, pushing power, space and planning considerations up the decision list. That is exactly where regional locations start to look like the sensible option. Government data shows how concentrated the market remains: as of autumn 2024, London is estimated at 1,048MW of colocation IT load. Compare that with 44MW in the East of England, 17MW in the North East and 30MW in Scotland. The gap is huge, yet it is not a permanent advantage.

Moving Beyond SolarWinds: Building a Modern Observability Strategy

For years, platforms like SolarWinds have been a standard in IT environments. They helped teams answer a fundamental question: are systems up or down? That approach worked well when environments were more contained and predictable. The challenge is that most environments no longer operate that way. Hybrid infrastructure, cloud services, and tightly interconnected applications have changed what “visibility” needs to mean.

Building for the Agentic Era: Engineering Excellence at Harness | Harness Blog

As AI agents become ubiquitous across the software development lifecycle, engineering teams must do more than adopt new tools; they must redesign how they build, verify, and operate software. This post distills the vision, priorities, and best practices that guide engineering excellence at Harness. Different products sit at the heart of the Harness platform.

What is Terragrunt and how does it simplify Terraform Workflows? | Harness Blog

Managing Terraform across dozens of AWS accounts becomes a maintenance nightmare fast. Teams end up copy-pasting the same backend configurations, provider blocks, and variable definitions hundreds of times. Terragrunt acts as an orchestrator above Terraform, eliminating this duplication through shared configuration inheritance and dependency management. When financial services teams manage 200+ microservices across multiple environments, these DRY patterns become essential for governance and consistency.