Operations | Monitoring | ITSM | DevOps | Cloud

The latest News and Information on Monitoring for Websites, Applications, APIs, Infrastructure, and other technologies.

Prometheus v2.11 Released

Since graduating within CNCF last August, Prometheus has adopted a new schedule for releases every six weeks. The latest release, v2.11, arrived on July 9. Prometheus 2.11 includes a new option to compress WAL records using Snappy, query performance improvements, the option to use Alertmanager API v2, and more. You can download the latest version here. prometheus_tsdb_wal_reader_corruption_errors is now renamed to prometheus_tsdb_wal_reader_corruption_errors_total.

An Introduction to Python List Comprehensions

Python list comprehensions offer a concise method of interacting with each element of a list. Even though they’ve been available since Python 2.0, their syntax often demotivates people from using them. This article aims to introduce List Comprehensions in a friendly way and offer you one more Python feature to add to your scripting toolbox.

Development workflow for serverless applications

Serverless applications require a whole new approach to development workflow. In this article, Lumigo Director of Engineering Efi Merdler-Kravitz details the guiding principles and tools used at a 100% serverless company to ensure the most efficient workflow possible. We are not going to talk about product development flow (no product managers were harmed during the making of this post!).

5 Best Practices for Using AI to Automatically Monitor Your Kubernetes Environment

If you happen to be running multiple clusters, each with a large number of services, you’ll find that it’s rather impractical to use static alerts, such as “number of pods < X” or “ingress requests > Y”, or to simply measure the number of HTTP errors. Values fluctuate for every region, data center, cluster, etc. It’s difficult to manually adjust alerts and, when not done properly, you either get way too many false-positives or you could miss a key event.

How to use ApacheBench for web server performance testing

When developing web services and tuning the infrastructure that runs them, you’ll want to make sure that they handle requests quickly enough, and at a high enough volume, to meet your requirements. ApacheBench (ab) is a benchmarking tool that measures the performance of a web server by inundating it with HTTP requests and recording metrics for latency and success.

Consul monitoring tools

In Part 1, we looked at metrics and logs that can give you visibility into the health and performance of your Consul cluster. In this post, we’ll show you how to access this data—and other information that can help you troubleshoot your Consul cluster—in four ways: Consul provides a built-in CLI and API that you can use to query the most recent information about your cluster, giving you a high-level read into Consul’s health and performance.

Trigger an on demand uptime & broken links check after a deploy

You can use our API to trigger an on demand run of both the uptime check and the broken links checker. If you add this to, say, your deploy script, you can have near-instant validation that your deploy succeeded and didn't break any links & pages. Our API allows you to trigger an on demand run for every check we do. But, it's an API - so it requires a set of IDs. First, let's find the different checks your site has.

Squared Up for Azure is coming

I am delighted to announce a big new initiative that we have been working on here at Squared Up. Our engineers have been working their socks off to build a new product, Squared Up for Azure, and it’s shaping up very nicely. It’s not ready for you to play with quite yet, but we plan to have an early release available for enthusiastic testers in late September (sign up below!).