Operations | Monitoring | ITSM | DevOps | Cloud

Fix flaky tests in your sleep with Chunk by CircleCI

A test fails. You rerun it and it passes. You shrug and move on. This is how most teams deal with flaky tests. The “rerun until green” approach works in the moment, and rerunning from failed tests is a useful way to confirm whether a failure is real. But reruns don’t fix the underlying issue. Over time, they burn CI resources and can hide real instability in your code. On the other hand, fixing flaky tests can mean hours of work.

Grafana and Grafana Cloud release cycle: An end-of-year update

With the end of the year fast approaching, we want to let you know about some important dates for our upcoming release freezes. Our annual release freeze helps ensure stability for everyone during the holiday season, which is a critical time for many of our customers. This pause helps us protect our on-call teams and maintain a smooth experience for you.

The next evolution of WebPageTest has arrived, and it's a game-changer

Now fully integrated into Catchpoint’s Internet Performance Monitoring (IPM) platform, WebPageTest is no longer just a testing tool; it’s your full-stack performance command center. From AI-powered insights to automation and Smartboards, the new WebPageTest gives digital experience teams everything they need to move beyond page speed and master end-to-end performance. Test smarter, detect faster, and optimize every layer of performance with a unified, AI-powered platform built for experts.

Single-Cloud Dependency Is a Disaster Waiting to Happen

The impact of the AWS outage has reminded many businesses of the risk for businesses that rely heavily on centralised cloud infrastructure, especially when so many essential services are concentrated in a single region. But at the wider industry level, this is also a warning around the widespread lack of contingency planning for cloud failures. Reactive response must give way to strategically planned disaster recovery protocols that engender a resilient cloud market.

Get organized, actionable insights from complex test environments with Datadog Test Suites

Modern teams often run hundreds of synthetic tests across multiple services, environments, and user journeys. While these tests provide deep visibility, managing them as a flat list can quickly become overwhelming, especially as organizations scale and teams specialize.

Top 11 Ruby APM Tools for 2025: A Performance-Driven Selection

Observability has become a core part of running Ruby applications at scale. Knowing how your app performs — from request latency to background job execution — helps catch slowdowns early and improve reliability. This blog walks through some of the most useful APM tools for Ruby in 2025. Each section highlights what the tool does well, where it fits best, and what kind of visibility it brings to your application's performance.

10 Proven APM Best Practices to Reducing Latency and Improving Response Time

Speed defines user loyalty. Recent market research indicates that organizations adopting advanced application performance monitoring (APM) tools are achieving measurable gains in user engagement, retention, and revenue. “ A 2025 performance study found that businesses tracking latency and response time proactively reduced customer churn by up to 30%. ” As applications expand across distributed architectures, microservices, and cloud environments, performance gaps become harder to diagnose.

How to Replace Synthetics with the httpcheck Receiver

A 200 OK doesn't always mean everything is okay. You've probably seen it: your health check endpoint returns success, but your users are staring at an error page. Maybe the database connection pool is exhausted, or a critical downstream service is timing out, but your API dutifully returns 200 because technically it responded. This is the reality of monitoring HTTP endpoints in production—status codes alone don't tell the whole story.

Announcing HAProxy ALOHA 17.5

HAProxy ALOHA 17.5 is now available. This release delivers powerful new capabilities that improve security and performance — while future-proofing HAProxy ALOHA to enable richer features and advanced functionality. With this release, we’re introducing HTTPS health checks to Global Server Load Balancing (GSLB), new partitioning for larger firmware updates, enhanced web application firewall (WAF) functionality, and our new Threat Detection Engine (TDE).