Operations | Monitoring | ITSM | DevOps | Cloud

Using AI + Rollbar's Session Replay to Understand Complex Errors

Front‑end bugs are notoriously hard to reproduce. By the time an error shows up in your monitoring tool, the most important context is already gone: what the user actually did. Session replay helps—but only if someone has the time and patience to scrub through recordings, correlate events, and form a hypothesis. That’s where Rollbar’s MCP server, paired with an AI agent like Github Copilot, changes the game.

Fresh from AWS re:Invent: Supercharging HAProxy Community with AWS-LC Performance Packages

The timing couldn’t have been better. Last week, the tech world descended on Las Vegas for AWS re:Invent. It was the perfect venue to talk about cloud infrastructure, scale, and the future of application delivery. While we enjoyed talking shop at our booth, we didn't just bring swag and demos; we brought a significant performance improvement for our open-source community.

The Impact of Network Downtime on Enterprise Productivity - and How Monitoring Helps

Enterprise IT teams operate under relentless pressure to maintain seamless connectivity, yet many business leaders underestimate the financial gravity of Network Downtime. Studies consistently show that even a brief outage can cost enterprises hundreds of thousands of dollars per hour, positioning downtime as one of the most disruptive threats to business continuity.

Major Cloud Outages of 2025

Cloud outages in 2025 ranged from minor ones affecting some sections of users, to major ones affecting hundreds or thousands of users. Services like Cloudflare and AWS on which many other services depend experienced outages that affected many due to the cascading effect. Let's look at some of the major cloud outages in 2025.

How to use AI to analyze and visualize CAN data with Grafana Assistant

Note: A version of this post originally appeared on the CSS Electronics blog. Martin Falch, co-owner and head of sales and marketing at CSS Electronics, is an expert on CAN bus data. Martin works closely with end users, typically OEM engineers, across diverse industries, including automotive, maritime, and industrial. He is passionate about data visualization and AI—and he’s been working extensively with Grafana Assistant.

How to use Gremlin's Reliability Report

Modern applications can easily include hundreds of discrete services, all of which need to be reliable in order for the application to function correctly. While running tests on a handful of critical services can lead to small reliability improvements, real impact requires testing and increased reliability visibility across your entire organization. That’s the logic behind the new, improved Reliability Reports within Gremlin.

AI Reliability, Part 2: When the Datacenter Becomes the Bottleneck

In Part 1, we talked about all the hidden complexity inside AI systems: the pipelines, GPUs, embeddings, vector databases, orchestration layers, and everything else that quietly determines how reliable an AI-first product really is. But all of that software still rests on something far less glamorous: the physical infrastructure underneath it.

Elastic and Microsoft partnership achievements in 2025

Highlights of another successful year of customer-centric collaboration Once again, our partnership delivered an impressive year of innovation with Microsoft Azure, Azure AI Foundry, and Azure OpenAI. This blog highlights our continued collaboration with Microsoft to better serve customers throughout 2025 and our key moments at Microsoft Ignite.

How Aerospace Companies Use InfluxDB

Over the past two decades, we’ve witnessed the instrumentation of virtually everything in the aerospace industry, from manufacturing floors to satellites orbiting Earth. And it’s no longer just NASA and other government organizations leading the charge. The commercial space industry has grown exponentially, with private companies developing everything from GPS satellites to electric VTOL aircraft.

AWS re:Invent 2025: 6 FinOps Signals That Mattered

This year’s AWS re:Invent was a blur of GPUs, LLMs, and infrastructure roadmap reveals — but for those listening between the keynotes, another story was unfolding. Between hallway chats, booth conversations, and live polls, a signal emerged from the noise: FinOps is growing up. Mature cloud teams aren’t just managing costs — they’re asking smarter, more strategic questions about value, forecasting, and engineering accountability.