Operations | Monitoring | ITSM | DevOps | Cloud

A tale of two incident responses: How our AI assistant found the root cause 3.5x faster

About two months ago, an incident at Grafana Labs was kicked off in typical fashion: A series of alerts were triggered, our on-call engineer acknowledged it on Slack, and the rest of the team quickly began hypothesizing about the potential culprit. But the way the incident was resolved was anything but typical. Yes, our internal team followed best practices to resolve the incident as quickly as possible.

AI API Aggregation: Managing Costs And Complexity Across Multiple LLMs

Running multiple LLMs without aggregation can feel like managing five different clouds with no dashboard. Sure, you can make it work, but you won’t like the bill. And most SaaS teams didn’t start with a multi-LLM strategy. It just happened. You added one model for reasoning, another for summarization, or maybe a fine-tuned version for customer support. Fast-forward six months, and your AI stack looks like a tangle of APIs. And each charges tokens on its own terms.

Prioritize errors and create tickets using Rollbar's MCP Server

Production errors can feel overwhelming. Your Rollbar dashboard is filling up with alerts, your team is scrambling to understand what needs immediate attention, and critical revenue-impacting issues might be buried among less urgent problems. In this post, we'll walk you through a workflow that transforms production error chaos into organized, prioritized action items. We'll cover everything from analyzing Rollbar errors to creating properly linked Linear tickets.

MachineGPT: Speaking the Language of Machines to Shape the Future of AI

At.conf25, we took a bold step forward—introducing the concept of MachineGPT, which brings the power of generative AI to one of the most overlooked resources: machine data. MachineGPT speaks the language of machines. Just like ChatGPT learned the grammar of words and sentences to understand questions and respond in human language, MachineGPT can learn the hidden “grammar” of how systems behave through machine data.

The AI Workload Punishes Bad Habits

The AI workload presents the ultimate challenge, highlighting the structural limitations of the traditional hyperscaler model. In this segment from a Civo Navigate London 2025 session, Kelsey Hightower explains exactly why AI adoption forces enterprises to confront flawed architecture and rising astronomical costs. When specialized hardware is scarce and rented GPUs sit idle at a premium, it’s clear that traditional cloud providers were not built for this era. Data that didn't move is forcing organizations to move compute back to where it lives.

Modernising Middleware and B2B Integration with Assurance

Modernising enterprise middleware is now a strategic necessity for cost efficiency, AI-readiness, and operational clarity. Hybrid estates of IBM MQ, Apache Kafka, and other brokers hide inefficiencies that drain profitability, but an operating model built on Assurance and Optimisation restores transparency and control. By unifying data, rebalancing workloads, and enabling safe AI autonomy, organisations can build a resilient “Confidence Economy.”

5 Skills Intelligence Platforms to Watch in 2025, Reviewed & Ranked

Businesses need to build strong teams, and leaders, within their organization so they can continue to drive productivity and efficiency. This also offers more than a few other benefits, like improving employee morale and retention, enhancing your employer brand, and helping you run a more cost-effective business. Skills intelligence platforms are a vital part of this. They let companies implement affordable and effective ways to engage employees as they take their careers to the next level.

How AI Is Transforming Field Service Routing and Operational Efficiency

Before, field service operations depended on set schedules, hand-planned routes, and local dispatchers. Even though we are aware of this, routing based on intuition is becoming less effective as service networks become more complex, customer expectations rise, and operating expenses shift. How can companies with a large fleet of service vehicles efficiently arrange personnel, vehicles, and parts to meet service level agreements while minimizing costs and downtime?

Mastering Product Design: How Top Agencies Like Phenomenon Studio Create Market-Leading Products

In today's experience-driven economy, the quality of your product design can determine your company's trajectory. But what exactly separates exceptional products from mediocre ones? The answer often lies in partnering with the right product design agency. As an industry-leading provider of web design and development services, Phenomenon Studio has demonstrated how strategic design partnerships can elevate products from functional to phenomenal, creating sustainable competitive advantages in crowded markets.

Beyond Isolated AI: How the Selector MCP Server Connects Agents, Context, and Action

AI in network operations is evolving faster than ever. But while new models and agents are emerging almost daily, they’re often working alone, with each confined to its own context, data, and domain. One model might analyze telemetry, another handles automation scripts, and a third generates summaries or recommendations. Each model might be intelligent on its own, but without a way to share context, they end up thinking in isolation, limiting what they can achieve together.