Operations | Monitoring | ITSM | DevOps | Cloud

Latest News

Prometheus v2.11 Released

Since graduating within CNCF last August, Prometheus has adopted a new schedule for releases every six weeks. The latest release, v2.11, arrived on July 9. Prometheus 2.11 includes a new option to compress WAL records using Snappy, query performance improvements, the option to use Alertmanager API v2, and more. You can download the latest version here. prometheus_tsdb_wal_reader_corruption_errors is now renamed to prometheus_tsdb_wal_reader_corruption_errors_total.

An Introduction to Python List Comprehensions

Python list comprehensions offer a concise method of interacting with each element of a list. Even though they’ve been available since Python 2.0, their syntax often demotivates people from using them. This article aims to introduce List Comprehensions in a friendly way and offer you one more Python feature to add to your scripting toolbox.

I Came, I Saw, I Monitored: Troubleshoot Unified Communications Like a Roman Emperor

“We were born to work together like feet, hands, and eyes, like the two rows of teeth, upper and lower … like Cisco HCS, Nortel, or Skype for Business and our distributed development teams.” Marcus Aurelius, Roman Emperor, Unified Comms Futurist* * (not really) OK, so, the famed Roman emperor may not have mentioned technology in his A.D.

Development workflow for serverless applications

Serverless applications require a whole new approach to development workflow. In this article, Lumigo Director of Engineering Efi Merdler-Kravitz details the guiding principles and tools used at a 100% serverless company to ensure the most efficient workflow possible. We are not going to talk about product development flow (no product managers were harmed during the making of this post!).

5 Best Practices for Using AI to Automatically Monitor Your Kubernetes Environment

If you happen to be running multiple clusters, each with a large number of services, you’ll find that it’s rather impractical to use static alerts, such as “number of pods < X” or “ingress requests > Y”, or to simply measure the number of HTTP errors. Values fluctuate for every region, data center, cluster, etc. It’s difficult to manually adjust alerts and, when not done properly, you either get way too many false-positives or you could miss a key event.

How to use ApacheBench for web server performance testing

When developing web services and tuning the infrastructure that runs them, you’ll want to make sure that they handle requests quickly enough, and at a high enough volume, to meet your requirements. ApacheBench (ab) is a benchmarking tool that measures the performance of a web server by inundating it with HTTP requests and recording metrics for latency and success.

Consul monitoring tools

In Part 1, we looked at metrics and logs that can give you visibility into the health and performance of your Consul cluster. In this post, we’ll show you how to access this data—and other information that can help you troubleshoot your Consul cluster—in four ways: Consul provides a built-in CLI and API that you can use to query the most recent information about your cluster, giving you a high-level read into Consul’s health and performance.

Trigger an on demand uptime & broken links check after a deploy

You can use our API to trigger an on demand run of both the uptime check and the broken links checker. If you add this to, say, your deploy script, you can have near-instant validation that your deploy succeeded and didn't break any links & pages. Our API allows you to trigger an on demand run for every check we do. But, it's an API - so it requires a set of IDs. First, let's find the different checks your site has.