While incredibly powerful, one of the challenges when building an LLM application (large language model) is dealing with performance implications. However one of the first challenges you'll face when testing LLMs is that there are many evaluation metrics. For simplicity let's take a look at this through a few different test cases for testing LLMs.
Agentic AI - the term has stirred much interest since being mentioned for the first time. And why not? The business world is still basking in the comfort, ease, and speed GenAI has brought since its democratization. Agentic AI promises to propel this leap in growth further. It promises more time freed up for strategic work, for finding new avenues of development, and a complete freedom from mundane work.
Not long ago, AI was largely experimental, but today it’s a strategic imperative. Enterprise AI adoption has surged to record levels, with over three-quarters of organizations now using AI in at least one function . In Deloitte’s latest global study, almost all organizations reported measurable ROI from AI, and 74% said their most advanced AI initiatives meet or exceed ROI expectations .
Seems like you can’t throw a rock without hitting an announcement about a Model Context Protocol server release from your favorite application or developer tool. While I could just write a couple hundred words about the Honeycomb MCP server, I’d rather walk you through the experience of building it, some of the challenges and successes we’ve seen while building and using it, and talk through what’s next. It should be pretty exciting, so strap in!
Traditional monitoring approaches have served IT operations for decades, providing basic visibility into system health through predefined metrics and thresholds. However, these conventional methods face significant limitations when confronted with modern, complex environments: Static Thresholds and Rules Traditional monitoring relies heavily on manually defined thresholds and rules.
If 2023 was the year AI entered the enterprise conversation and 2024 was the year of AI overhype, 2025 is the year it takes action. “Agentic AI” has quickly become the banner term for next-gen systems that aren’t limited to generating responses—they operate, decide, and resolve. The shift from passive chatbots to autonomous agents is underway, and for IT operations teams, the implications are massive.
In the AI world, there’s a lot of buzz about creating custom large language models (LLMs) tailored for specific domains, perhaps for better security, context, expertise, or accuracy. It’s an appealing idea: What better way to solve your niche challenges than with a bespoke AI designed just for you? But here’s the thing — building a great LLM isn’t just challenging; it’s prohibitively expensive and resource-intensive.
AI can be a transformative tool in network operations — but only when it’s tied to clear, measurable outcomes. Rather than chasing hype, IT and NetOps teams should focus on solving specific operational challenges like reducing MTTR, cutting costs, and stabilizing infrastructure. AI has real potential when strategically applied, and when aligned with business goals, it becomes a powerful ally in modern network operations.
Written by author, adapted by AI GitKraken Desktop 11.0 is here, and it’s more than just a version bump. We’re introducing AI-powered features designed to accompany your workflow and help you stay focused on what matters most. It’s changed how I approach commits, and I’m sure it will help y’all too!