Operations | Monitoring | ITSM | DevOps | Cloud

Inference Economics: What It Is And Why It Matters Now

Somewhere between a model’s first demo and its first production workload, the cost conversation changes completely. Training is a big number, but it’s a finite one. Inference isn’t. Every user interaction, every query, every API call triggers compute behind the scenes — and unlike training, inference never stops billing. That shift from one-time expense to ongoing operational cost is where inference economics begins.

Inside Pandora's Box: How CloudZero AI Hub Cracks Cloud Cost Intelligence

Years in the FinOps trenches taught me one thing: The data has never been the problem. The data exists. It’s out there, scattered across provider invoices, buried in tagging gaps, locked behind dashboards that maybe three people in your org actually know how to navigate. The real problem? Nobody can get to it when they need it. Engineers ship features without understanding what they cost the business, let alone whether they improved margin.

Skills vs. MCP: You're probably reaching for the wrong one

Everyone is adding Model Context Protocol (MCP) servers to everything right now. And I get it. MCP is clean. It’s standardized. You write a server, expose some tools, and suddenly your LLM can query your log platform, pull a dashboard, and fire an alert. It feels like the right abstraction. But I’ve watched teams at serious companies burn weeks building MCP integrations for workflows that should have been skills, and build skills for things that genuinely needed MCP.

7 Real Ways to Modernize NetOps with Kentik AI Advisor

Kentik’s AI Advisor acts as a virtual network engineer, helping teams of all skill levels troubleshoot, manage, and optimize their infrastructure with unprecedented speed and context. We explore seven practical NetOps use cases, from rapid incident triage and capacity planning to upcoming live-device command support, that demonstrate how using AI as a collaborative teammate dramatically reduces manual investigative work.

Generating metrics from traces with cardinality control: A closer look at HyperLogLog in Tempo

While tracing is a critical component of any observability strategy, metrics — especially RED metrics (request rate, error rate, and duration) — are widely considered the gold standard for monitoring service health. Tempo, the open source, easy-to-use, and highly scalable distributed tracing backend, is well known in the OSS community for storing and querying traces. It can also, however, generate RED metrics directly from those traces using the optional metrics-generator component.

Use plain English to query your multi-cloud infrastructure in Resource Catalog

Modern cloud environments include thousands of resources across providers, teams, and accounts. Organizations need the ability to quickly locate the right resources so that they can manage resource compliance and troubleshoot issues. When engineers need to answer questions such as which databases are still on extended support or which storage buckets lack encryption, they often have to switch consoles, use provider-specific query languages, and know obscure version strings or configuration flags.

Hot Takes: What the AI Hype Gets Wrong About Software Engineering Excellence | Harness Blog

Ahead of the DevOps Modernization Summit, Matthew Skelton, CEO & CTO of Conflux shares his takes on output-driven AI, how DORA metrics aren’t enough, and why governance and compliance must be built into the platform. ‍ Matthew Skelton is the CEO & CTO of Conflux and a featured speaker at this year’s DevOps Modernization Summit. Ahead of our annual summit, Matthew has shared his hot takes on AI, DORA, and the key to successful automation.

How to Build AI-Native Security Resilience (And Finally Get Developers And Security On The Same Team) | Harness Blog

Developers and security professionals have struggled to get on the same page for what seems like forever and AI is only making that divide larger, according to results from our State of AI-Native Application Security 2025 research report.

Public Sector Observability: Service Experience and Reliability Are Now Mission-Critical

Reliable digital services aren’t optional for public sector agencies. They’re essential to mission success. Across the U.S. public sector, service experience and reliability have moved from operational concerns to mission requirements. At a federal level, Executive Order 14058 makes improving service delivery and customer experience a federal priority, measured by real outcomes for the public. And for state and local governments, the bar is set by the private sector.