Over the years, I found that building out monitoring scripts and using them properly has proven to be a challenge. When I look back at my internal IT days using platforms like Whatsupgold, PRTG, or N-central, the question always remained the same: how can I monitor efficiently and get alerts that matter? In this blog post, I thought I’d tackle something that is a challenge for a lot of people: monitoring Hypervisors.
We’ve already outlined why API performance matters and what aspects of APIs to test, but what is the difference between API testing and monitoring? As with most things, context matters. The use cases for testing and monitoring are different because the objectives are different. The ultimate goal is to verify that your APIs are functioning properly, but staging environments vary significantly from production environments.
If you’re involved in IT, you’ve likely come across the word “Kubernetes.” It’s a Greek word that means “boat.” It’s one of the most exciting developments in cloud-native hosting in years. Kubernetes has unlocked a new universe of reliability, scalability, and observability, changing how organizations behave and redefining what’s possible. But what exactly is it?
When it comes to IT, you can’t do anything with an asset you can’t see. When it comes to your networking, monitoring offers the eyeballs to know what is going on. But IT and network pros don’t spend all day staring at a dashboard waiting for something to happen. Like your local police department, they rely on notifications of trouble. Instead of 911 calls, IT depends on network alerts.
Interest is growing in cloud computing’s ability to reduce carbon, but the ‘green cloud’ argument is not as clear as many believe. I’ve argued over the years that cloud computing is a step in the right direction when it comes to sustainable computing. My viewpoint often opposes environmental organizations that argue against the many new power-hungry data centers that cloud companies build.
Modern DevOps teams that run dynamic, ephemeral environments (e.g., serverless) often struggle to keep up with the ever-increasing volume of logs, making it even more difficult to ensure that engineers can effectively troubleshoot incidents. During an incident, the trial-and-error process of finding and confirming which logs are relevant to your investigation can be time consuming and laborious. This results in employee frustration, degraded performance for customers, and lost revenue.