Operations | Monitoring | ITSM | DevOps | Cloud

AI performance reviews for your app with the Flare CLI

The Flare CLI connects to your Flare performance monitoring data and uses AI to turn it into actionable insights, right from your terminal. In this video, you'll see how a single command pulls your real performance data from Flare, then generates a full review: identifying slow endpoints, spotting error trends, and suggesting concrete fixes. Links.

Fixing a production error with the Flare CLI and AI, from discovery to deploy

Using the Flare CLI and its agent skill to find, fix, and resolve a production error without leaving the terminal. The AI agent looks up the latest error on freek.dev via the Flare CLI, analyzes the stack trace against the local source code, generates a fix, deploys it using bash mode, and marks the error as resolved in Flare. Learn more.

Incident Report: Exercises, Cleanups, and Evacuations

Every year, Honeycomb runs disaster recovery scenarios in multiple environments, including in production. Although each of our instances runs in a single region, on at least three Availability Zones (AZs), we have multiple plans for partial regional failures, and particularly, zonal failures. One of these tests was run on December 5th, and after its successful completion came its cleanup steps.

Alerting Is a Socio-Technical System

In the previous posts, we’ve looked at how alert noise emerges from design decisions, why notification lists fail to create accountability, and why alerts only work when they’re designed around a clear outcome. Taken together, these ideas point to a broader conclusion. That alerting is not just a technical system, it’s a socio-technical one. Alerting systems encode assumptions about how people behave, how responsibility is distributed, and how decisions are made under pressure.

Trusted Ownership: How Ivanti Application Control scales beyond allowlisting

Application control is one of those security topics where many people carry old assumptions. Traditional allowlisting feels safe but quickly becomes a maintenance burden. Blocklisting feels reactive and incomplete. And while tools like Microsoft AppLocker led many to believe that strict allowlisting is the gold standard, modern attacks have proven otherwise. Attackers increasingly rely on legitimate, signed tools — used in the wrong context — to bypass list-based controls entirely.

Case Study - Troubleshooting Storage Failures in a VMware ESXi Infrastructure

IT problems happen even in the best architected infrastructure due to configuration changes, failures, upgrades and such. How quickly and effectively you can detect and resolve such problems dictates how efficient your IT operation is. Today, I’ll cover how eG Enterprise helped us troubleshoot a hardware failure (a storage battery failure) that that caused a cascade of failures in a VMware ESXi infrastructure.

PagerDuty's Slack App Just Got a Whole Lot Better (And We're Just Getting Started)

If you’ve been eyeing chat-native incident tools and wondering whether PagerDuty can compete in Slack, this one’s for you. Are you still treating your incident management platform like a glorified pager? It’s time for an update. Over the past months, we’ve been evolving our Slack app from a notification tool into a full incident command center, and we’re coming for the chat-native tools (ahem, incident.io).

The Ultimate Kubernetes Cost Monitoring And Management Guide

While Kubernetes enables teams to deliver more value faster, understanding and controlling Kubernetes costs remains challenging. You have disposable, replaceable compute resources constantly coming and going across a range of infrastructure types. Yet at the end of the month, you only get a billing line item for EKS costs and several EC2 instances.

How AV Integrations Improve IT Operations in Modern Workplaces

Every room passed the morning health check. Then the CEO's all-hands stuttered, audio dropped, and the service desk filled with duplicate tickets. That gap between a green dashboard and a successful meeting is where enterprise AV operations fail. AV is no longer a facilities side project, it is an IT workload.

The Hidden Cost of SaaS Sprawl: When Custom Development Makes More Sense

The average enterprise now spends $55.7 million on SaaS annually, an 8% jump from last year alone. Yet here is the uncomfortable truth: a significant chunk of that money is being quietly wasted on tools that overlap, go unused, or simply do not fit the way teams actually work. SaaS sprawl has become one of the most expensive and least visible problems in modern IT. And for a growing number of organizations, the answer is not another subscription. It is custom-built software designed around the way their business actually operates.