Operations | Monitoring | ITSM | DevOps | Cloud

Prioritize errors and create tickets using Rollbar's MCP Server

Production errors can feel overwhelming. Your Rollbar dashboard is filling up with alerts, your team is scrambling to understand what needs immediate attention, and critical revenue-impacting issues might be buried among less urgent problems. Sound familiar? In this post, I'll walk you through a workflow that transforms production error chaos into organized, prioritized action items. We'll cover everything from analyzing Rollbar errors to creating properly linked Linear tickets.

Cloudflare outage: another wake-up call for resilience planning

Another day, another massive Internet disruption, and this time it’s Cloudflare taking huge parts of the Internet offline. This incident is not an anomaly. It is part of a recurring pattern that has become standard in digital infrastructure. We have reached an inflection point in digital operations. Outages at major cloud and content delivery network (CDN) providers are now expected. The only real uncertainty is when it will happen next.

Introducing webvitals.com: Find out what's slowing down your site

Developers don’t need another “run this tool, stare at a number, and feel bad about it” website. So we built something different. WebVitals helps you analyze, optimize, and ship faster websites, all in one place. Built by the same folks who obsess over stack traces and slow queries, it connects the dots between performance metrics and what’s actually slowing your users down. In one place, you can.

KubeCon Atlanta Signals Key Shift: From Cloud Cost To Value Engineering

After three days of demos, sessions, and hallway conversations at KubeCon Atlanta, one thing became clear to CloudZero CTO Erik Peterson: the cloud-native world is shifting from cost control to value engineering. Teams aren’t just fighting bills anymore. They’re fighting complexity, GPU scarcity, Kubernetes sprawl, and pressure from the business to justify every dollar of technical investment. And this year’s KubeCon attendees? They were ready for those conversations.

AWS And Azure Outages Will Recur - Here's How You Ensure Resilience

The cloud has long promised limitless scalability and near-perfect uptime. But if you tried to access your Microsoft 365 dashboard or recline your smart bed last week, and got nothing but a spinning icon, you weren’t alone. In the span of 10 days, both Amazon Web Services (AWS) and Microsoft’s Azure Cloud suffered widespread outages that rippled across industries.

Uptrends x OpenTelemetry: Stream browser-level synthetic data into your observability stack

Dashboards and alerts can tell you something’s wrong, but they don’t immediately tell you why. A red indicator or synthetic test failure prompts detective work. You flip between dashboards, timestamps, and logs, trying to line up what the check saw with what the system did. Now imagine your monitoring could explain itself by sending traces directly into your OpenTelemetry (OTel) backend.

When Bots Grow Brains: RPA and Agentic AI For the Win

For a long time, robotic process automation (RPA) was the fastest way to scale repetitive digital work. Bots copied, clicked, and executed rule-based tasks faster than any human. They reduced error rates and delivered early wins for efficiency. Sounds just fine, right? Prepare for a Matrix moment, because the truth is that IT teams built RPA only for predictability. It could follow instructions, but it couldn’t adapt when something unexpected happened.

Strengthening Open Source Facter: Ensuring Compatibility and Essential Maintenance

Over the course of 2025, the Puppet Core team has been committed to developing secure, hardened Puppet code that our customers can rely on. As part of that shift, many Puppet platform components, including Facter, were brought under the Puppet Core model and were moved into private repositories.

Distributed Tracing for Microservices: 10 Essential Best Practices for 2026

Distributed tracing tracks how a single request moves across multiple microservices, helping teams see the entire execution path end to end. In modern architectures where dozens of services interact, it becomes difficult to understand where latency starts, why bottlenecks appear, and which component breaks under load. Traditional monitoring only shows isolated metrics. Distributed tracing connects those dots.