Operations | Monitoring | ITSM | DevOps | Cloud

Latest Posts

Broken windows: Why the 'Single Pane of Glass' is Impossible

It was only as I started to study information theory that I truly understood how nonsensically the computer worked in Star Trek: The Next Generation. Decades before voice assistants and at a time when only the most basic language parsing existed in practice, the computer on Star Trek could always give you the answer you wanted. No one ever spent any time clicking into multiple windows to find an answer, and the display always gave information that could be easily summarized in words.

The Real Cost of Synthetic User Testing with AWS

Every time I share a project using SaaS tools, someone inevitably responds that they could do the same thing on their own home server ‘for free.’ I mention this not because it is annoying, since I would never go on social media at all if annoying responses were allowed to change my behavior, but because I think it points to a basic misconception that still affects DevOps practitioners today: the refusal to accurately estimate the real costs of self-managed solutions.

How Often Should You Ping Your Site?

How often should you ping your site? Should you be checking every few minutes, or every hour? Surely you have other ways to detect problems, so maybe just a daily check of your API and main page would be enough, right? While there’s no single right answer for everyone, this post tries to break down how you can find the right cadence for your site checks.

Your Practical Guide to Reducing MTTR

Let’s face it. Incidents will always happen. We simply can’t prevent them. But we can strive to mitigate the impact incidents have on our product and customers. Ensuring high reliability depends on quickly and effectively finding and fixing problems. This is where the metric MTTR, standing for “mean time to restore” or “mean time to resolve,” becomes valuable for organizations.

Exploring the Synergy Between Testing and Monitoring in Software Development

The roles of testing and monitoring often intersect, yet they maintain distinct identities. In my near-decade in the tech sector I've observed how end-to-end (E2E) tests and synthetic monitoring, despite common frameworks and requirements, often fail to benefit from collaboration and synergy.

Parallel Scheduling Is Now GA: Detect Regional Outages Up to 20x Faster

I am happy to announce that Checkly now supports parallel scheduling as a new way to schedule your checks. Parallel scheduling allows you to reduce mean time to detection, provide better insights when addressing outages, and give improved accuracy in performance trends, making it a powerful new feature for all Checkly users.

Observability with OpenTelemetry and Checkly

Observability isn't just a buzzword; it's a vital compass guiding us through the maze of system health and performance. As we’ve adopted microservice architectures, the ability to know ‘what is currently happening in our system’ has diminished as our operational resilience has increased. We find services scattered among a maze of interconnections and interdependencies. And even the logs that used to guide are now scattered throughout this maze.

A Guide to Visual Regression Testing With Playwright and How to Get Started

I’m pretty sure that you’ve had a situation where you deployed a major UX change on your web app and missed the most obvious issues, like a misaligned button or distorted images. Unintended changes on your site can cause not only a sharp decline in user satisfaction but also a large fall in sales and customer retention. By identifying and resolving these discrepancies before the update went live, you could have prevented these outcomes.

Monitoring an Open Banking Flow With Playwright & Checkly

Open banking offers users a way to have easier access to their own bank account information, like via third-party applications. This is achieved by allowing third-party financial service providers access to the financial data of a bank's customers through the use of APIs, which enable secure communication between the different parties involved.