Operations | Monitoring | ITSM | DevOps | Cloud

Monitoring

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

How we use metamonitoring Prometheus servers to monitor all other Prometheus servers at Grafana Labs

One of the big questions in monitoring can be summed up as: Who watches the watchers? If you rely on Prometheus for your monitoring, and your monitoring fails, how will you know? The answer is a concept known as metamonitoring. At Grafana Labs, a handful of geographically distributed metamonitoring Prometheus servers monitor all other Prometheus servers and each other cross-cluster, while their alerting chain is secured by a dead-man’s-switch-like mechanism.

Getting Started with Spring Boot Actuator

Any production application needs to be monitored for its uptime. Let’s say you’ve developed a stock market statistics application, for example, using Spring Boot for your client. This application has to be up all the time while the stock market is open. If it’s down at a crucial time, it could mean huge losses for relevant stakeholders.

3, 2, 1 Liftoff! Launching Your ITSM Implementation

We have service desk liftoff! Well...almost. Completing an IT service management (ITSM) evaluation is no easy feat, but selecting a new solution doesn’t mean it’s time to take your foot off the pedal. Transitioning to a new solution shouldn't be a burden or take away from your day-to-day responsibilities. Developing a strategic approach to tackle your ITSM implementation can help expedite your time to value and maximize your resources.

9 Best Network Discovery Tools

Your organization’s network is large, complicated, and constantly expanding. While you might think you have a handle on it, manually monitoring your network can lead to inaccuracies due to outdated data, undetected devices, and other common visibility issues. A network device discovery tool can help you find devices on a network to manage your device’s health, troubleshoot performance problems, and prepare for your network’s future.

Top Observability Strategies for Distributed Systems

In a distributed IT environment, there are a lot of moving parts, and all of them need to be monitored to ensure everything is working as it should. The rise of more complex infrastructures interweaving the cloud, on-premises, and hybrid architectures makes this a challenge. To make sure you have adequate visibility, you need an IT observability strategy.

Improve Monitoring and Observability With The Catchpoint and Sumo Logic Integration

Sumo Logic is a cloud-based log management and analytics service that leverages machine-generated big data to deliver real-time IT insights. We’re excited to share that you can now easily integrate Catchpoint and Sumo Logic, giving you a number of fantastic benefits. The integration involves pushing data from Catchpoint to Sumo Logic using Webhooks and then query the data to build visualizations. Why do we use Webhooks?

Kafka Migration and Lessons Learned

Over the last few months, Honeycomb’s platform team migrated to a new iteration of our ingest pipeline for customer events. Our migration to this newer architecture did not go too smoothly, as can be attested by our status page since February. There were also many near-incidents where we got paged and reacted quickly enough to avoid major issues. We’ve decided to write a full overview of all the challenges we had encountered, which you can can download.