Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on DevOps, CI/CD, Automation and related technologies.

Intelligent FinOps: AI-Informed, AI-Enabled

AI is the new frontier for FinOps maturity. It introduces fresh spend patterns and new opportunities for value. As GPUs, inference, and retraining reshape costs, FinOps maturity grows through visibility, forecasting, and shared mindset about how these workloads drive business impact. In this 2025 post, I gave my guidelines for implementing AI tagging to give business context and clarity to vague AI invoices. Now, I’m sharing the next level up: how to drive FinOps in AI with AI.

From Chaos To Clarity: How Forcepoint Scaled FinOps Across The Organization

When Anthony Leung talks about FinOps, he’s speaking from operating at real scale — not theory. As VP of Engineering Platforms and Security Research at Forcepoint, he led a transformation that cut cloud spend in half while improving availability, and built a culture where engineers own their economics.

(Tech Talk) Shipping with Context Knowledge Graphs as the Backbone of AI-First Software Delivery

Knowledge graphs are essential to solving the context bottleneck in AI-First software delivery, which occurs because workflows, policies, and dependencies are siloed and invisible to AI agents. In this Tech Talk, Prateek Mittal ((Product Director of AI Core and Data Platform at Harness)) discusses the key concepts: Knowledge Graphs vs. Observability: Observability tells you "what is happening," while knowledge graphs tell you "what does that mean" by modeling structured relationships. They work together to link live signals to affected services or SLAs.

Introducing Harness Artifact Registry | Unified. Secure. Built for the Future Artifact Management

Managing build artifacts today is harder than it should be. Fragmented tools, security blind spots, and disconnected developer workflows make it difficult to keep builds safe, consistent, and production-ready. In this walkthrough, Shibam Dhar, DevRel Engineer at Harness, shows how Harness Artifact Registry unifies artifact management across the entire software delivery lifecycle — from creation to deployment — while improving security and developer experience.

We Built an MCP Server

When I joined Kubex last year, the company was already well aware of the growing power of Large Language Models. As a company focused on intelligent resource optimization for Kubernetes, GPUs, and cloud infrastructure, generative AI didn’t feel like a threat so much as a natural extension of where the industry was heading. Kubex had already invested heavily in machine learning, but it was becoming clear that foundation models could unlock an entirely new class of capabilities for our customers.

How Dartmouth avoided vendor lock-in and implemented LBaaS with HAProxy One

History is everywhere at Dartmouth College, and while the campus is steeped in tradition, its IT infrastructure can’t afford to get stuck in the past. In an institution where world-class research and undergraduate studies intersect, technology must be fast, invisible, and – above all – reliable. That reliability was put to the test when Dartmouth’s load balancing vendor was acquired twice in five years, as Avi Networks moved to VMware and VMware moved to Broadcom.

Custom Dashboard Creation: Step-by-Step Tutorial

Creating a custom dashboard is the best way to monitor metrics that matter most to your systems. Tools like MetricFire make this process straightforward by combining hosted Grafana and Graphite, eliminating the need for self-hosted solutions. Here's how you can build dashboards tailored to your needs.

The AI-Empowered Site Reliability Engineer: Automating the Balance of Risk and Velocity

You might expect an AI-SRE agent to target 100% reliable services, ones that never fail. It turns out that past a certain point, however, increasing reliability is worse for a service (and its users) rather than better! Extreme reliability comes at a non-linear cost: maximizing stability limits how fast new features can be developed, dramatically increases the operational cost, and reduces the features a team can afford to offer.