Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

New in Grafana Alerting: a faster, more scalable way to manage your alerts in Grafana

Effective alerting is the backbone of any observability strategy. But as your systems grow, managing hundreds or even thousands of rules can become a significant challenge. And when something goes wrong, the last thing you want is to fight with your tooling. That’s why we’re thrilled to announce the launch of our brand new alert rules list page, which we built to provide a faster, more intuitive, and scalable experience for teams of all sizes!

Getting started with MongoDB dashboards

MongoDB is a popular NoSQL database used by many modern web applications. Once your web application is up and running, you might find you need to monitor the application data for operational purposes. For example, you may need to report on user sign-ups, or monitor for problems like invalid data. SquaredUp is an easy-to-use dashboard that plugs directly into your MongoDB database to visualize and monitor your data.

Patterns for safe and efficient cache purging in CI/CD pipelines

"There are only two hard things in Computer Science: cache invalidation and naming things."—Phil Karlton In the age of increasingly frequent deploys, edge caching, and Jamstack adoption, caching plays a key role across the software delivery life cycle. In build and CI pipelines, caching compiled assets or dependencies helps reduce compute costs, speed up job runtimes, and lower the environmental impact (regarding energy usage) of repeated builds.

Behind the Dashboard - Catchpoint Traceroute

Behind the Dashboard is an ongoing series where we look under the hood of a specific Catchpoint feature. Each episode breaks down the technology itself, what’s challenging about using it for monitoring, and how we removed friction and toil to make it a valuable part of the Catchpoint platform. In this episode Leon, Brandon, and Sergey take a look at “traceroute” tests – a feature that may seem humble and unassuming, but has unexpected power and utility when it comes to identifying performance issues with your site, service, or application.

Ten Minute Troubleshooting: Meet (and Monitor) Users Where They Are

What do you do if your monitoring, APM, and synthetic tools tell you an application is up, but the users say it’s not? A good first question is to ask where your monitoring tools are located relative to both the users and the application itself. In this episode Mursi helps Leon identify his “red-light, green light” issue and adjust his monitoring to do a better job showing the REAL user’s experience.

Secure by Design: IT Modernization for Government

As government agencies modernize IT infrastructure, many are shifting to hybrid and multicloud environments. But this evolution brings heightened exposure to cyber threats. For the public sector, where data protection is tied to national security and public trust, compliance is more than a box to check—it’s the front line of defense. FedRAMP (Federal Risk and Authorization Management Program) provides a standardized framework for securing cloud services used by U.S. agencies.

Resilience with Zero Data Loss in High-Volume Telemetry Pipelines with OpenTelemetry and Bindplane

This was the problem one Bindplane customer had with processing enormous S3-stored log files. Our engineering team tackled the problem head-on, enhancing the S3 event receiver with offset tracking and chaos testing methodologies.

Goodput vs Throughput: The Differences and How They Affect Your Network

Two key metrics that often come up in discussions about network performance are throughput and goodput. While these terms may seem similar, they highlight different aspects of your network’s efficiency and misunderstanding them can lead to poor decision-making that can impact the way you manage your network and your business’ resources.