Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

We Turned Our WireShark Wizard Into a Markdown File

Rocky AI — Checkly’s AI agent — is now Generally Available. We developed Rocky AI over the last ~6 to 8 months. This is an aeon in AI-years. During this period, we learned a ton. About AI, but mostly about how to fit them into an existing SaaS product, not just another chat widget. This is my ramble…

Introducing Rocky AI to General Availability

After months of being available in Beta for our app users, Rocky AI is now generally available to all users and plans. Rocky AI is Checkly’s AI agent that works around the clock, 24/7, to make sure your application’s reliability is optimal. In this first release, Rocky AI ships with the ability to run continual Analysis on test and check failures, giving your teams AI-powered root cause analysis, impact analysis, and more.

Buy vs Build in the Age of AI (Part 1)

A few months ago, I spoke to an engineering manager who proudly told me they had rebuilt their monitoring stack over a long weekend. They’d used AI to scaffold synthetic checks. They’d generated alert logic with dynamic thresholds. They’d then wired everything into Slack and PagerDuty, and built a clean internal dashboard. “It used to take us weeks to prototype something like this,” they said. “Now it’s basically instant.” They weren’t wrong.

February 2026 product updates

February brought powerful new improvements to StatusGator – from better status page analytics and expanded API capabilities to smarter incident detection. We also published our latest Early Warning Signals report, highlighting major outages we detected before providers acknowledged them. Here’s everything that’s new.

Did ChatGPT take down Claude?

On March 2, 2026, Claude experienced a widespread service disruption that affected users across North America, Europe, Asia, and Australia. The outage quickly drew significant media attention, with numerous technology news outlets reporting on user frustration and downtime. In the early hours of the incident, some commentators speculated that the disruption may have been caused by a sudden influx of new users migrating from OpenAI. However, there is no public evidence confirming that theory.

The Battle for Control: Introducing the Avantra AIR Beta

SAP operations teams are drowning. Every day is a battle against alert fatigue, complex root causes, and repetitive firefighting. And while vendor spin will tell you that moving to the cloud or adopting SAP RISE magically simplifies everything, the reality on the ground is entirely different. We call it the Hybrid Cloud Paradox: Different providers might own different parts of your critical business landscape, but you still own the business risk.
Sponsored Post

The art of software engineering management

Like any leadership role, leading an engineering team in a mature, compact company like Raygun comes with both honor and responsibility. Leading a major development project is a bit like conducting a symphony orchestra, where every individual plays a crucial role and has a great impact on the work they release to customers and end-users.

What does investigation look like when data lives in multiple tools?

War rooms don’t fix fragmentation. They expose it. Incident hits. App checks traces. Infra checks hosts. Cloud checks dashboards. Network checks packets. Everyone sees their layer. No one sees the system. So we guess. Rollback. Add capacity. Freeze change. The noise stops. The constraint doesn’t. Modern failures don’t live in tools. They live in dependencies. If your platform can’t follow a transaction across hybrid and AI infrastructure — to the exact constraint — you don’t have observability.

Telemetry Talks ep.2 - How to use OpenTelemetry in VictoriaMetrics Cloud

Telemetry Talks – Episode is here! In this episode, Diana and Jose introduce VictoriaMetrics Cloud, covering what it is, the problems it solves, and its pricing model, including how overages are handled. If you’re building or operating cloud-native systems and want a clearer, real-world understanding of OpenTelemetry and managed observability, this episode is for you. Resources for Further Learning.