Operations | Monitoring | ITSM | DevOps | Cloud

A field guide to the agents in your cluster

You know every service in your cluster by name. You know which team owns each one, what it talks to, how it scales, where its logs go. The agents are a different story. That’s not a criticism, it’s an observation, and it’s one we keep running into. Every company we talk to is shipping agents of some kind, from scales of 10s to 1000s. Customer service bots that field tier-one tickets. Internal copilots that draft emails and summarise meetings and write the boring half of every PR.

Automated Network Documentation 101: What You Need to Know to Get Started

Network documentation has a way of becoming everyone’s problem and nobody’s responsibility. Over time, diagrams become outdated, configuration changes go undocumented, and critical knowledge ends up living in the heads of a few senior technicians instead of somewhere the entire team can access it. That’s why organizations are turning to automated network documentation.

Balance AI innovation and governance with Sumo Logic AI and ML apps

AI is changing how teams work. Developers are generating code faster, security teams are automating investigations, and employees across the business are using AI tools to accelerate research, content creation, and decision-making. But this adoption comes with a catch. As usage explodes, it introduces a new set of security risks: a rapidly expanding attack surface, faster attack timelines, potential data exposure, and an alarming lack of visibility into how these tools are being used.

Mainframe DevOps: Modern CI/CD for Big Iron | Harness Blog

For Platform Engineering teams, the goal has always been clear: build a secure, scalable internal developer platform that reduces cognitive load and accelerates time-to-market. Yet, a massive obstacle often remains hidden in plain sight: the mainframe. While your distributed teams are shipping cloud-native microservices multiple times a day, your core backend mainframe applications frequently remain locked in an isolated silo, lagging behind on slow monthly or quarterly cadences.

Centralize DHCP Visibility with the Windows Discovery Agent

Your Dynamic Host Configuration Protocol (DHCP) server already knows what’s connected to your network. The problem is that DHCP data rarely stays aligned with the rest of your infrastructure systems. Instead, it becomes fragmented across Windows servers, branch offices, spreadsheets, and disconnected operational tools. Lease data ages, assignments go untracked, and teams lose confidence in their network inventory.

The Inference Paradox: How Split-Brain LLMs Are Killing Your GPU ROI

During the Toronto KCD (Kubernetes Community Days), I attended an insightful talk on AI resource optimization that highlighted a staggering Gartner study: “AI infrastructure is adding $401 billion in new spending this year alone. Yet, real-world audits tell a much darker story, revealing that average GPU utilization in the enterprise is stuck at a dismal 5%”. While many people in the audience were shocked by that number, the data didn’t come as a surprise to us.

The Two-Sided Scheduling Problem: Reaching the Next Layer of Cloud Savings

You’ve deployed Karpenter or Cluster Autoscaler and tightened your resource requests, but while you saw an initial dip in your cloud bill, your savings have flatlined. Organizations that thought they had the fundamentals of cloud cost under control are now seeing stagnation. The problem isn’t that they need another FinOps tool or better visibility. The problem is that the current state of enterprise cloud cost optimization strategy is fundamentally reactive.

Getting started with Prometheus dashboards

Prometheus is a wildly popular open source monitoring tool typically used for monitoring Kubernetes environments and containerized workloads. But how do you turn the mountains of metrics into a clear picture of health and performance? SquaredUp plugs directly into your Prometheus database to visualize and monitor your data. What sets SquaredUp apart from other Prometheus visualization options like Grafana and Perseus is just how easy it is to visualize, monitor and share Prometheus dashboards.