Operations | Monitoring | ITSM | DevOps | Cloud

Latest Blogs

PID Controllers and InfluxDB: Part 2 - Digital Twin

In a previous post, we described a CSTR and a PID controller. This post will cover the code and architecture of the digital twin from this project repo. The project leverages Kafka for data streaming, Faust for data processing, InfluxDB for storing the time series data, and Telegraf for writing data from the topic to InfluxDB. We’ll also cover the advantages and disadvantages of this stack.

Three Advanced Notification Features that Your Site Uptime Monitoring Vendor MUST Deliver

To say that site uptime vendors deliver notifications is about as insightful as saying that cars have steering wheels, planes have wings, or TikTok videos have cringe. It’s a given. But this doesn’t mean that all vendors use the same notification playbook. Some vendors offer basic (read: superficial) notification features, while others offer advanced notification features.

How Memory Usage Patterns Can Derail Real-time Performance

In this article, we will learn how memory usage patterns can affect the real-time performance of an embedded application, drawing from a recent experience tracing an audio DSP application running on an embedded Linux platform. First, I will introduce the product in question and the real-time audio software I developed for it. Then, I’ll describe the issues I encountered with audio callbacks and the strategy I followed to determine the cause of the issues, ending with my solution and lessons learned.

Visualize CockroachDB in Grafana: Introducing the CockroachDB Enterprise data source

We’re excited to announce the addition of CockroachDB as an Enterprise data source for Grafana. The data source, available now in private preview, enables secure and seamless access to the CockroachDB distributed SQL database, while leveraging Grafana’s powerful visualization capabilities.

Icinga Director: Cloning dictionary row entries for objects from import sources

Over use of dictionaries in monitoring leads to complex and ugly configurations. This in turn makes monitoring complicated. Hence, it is advisable to use it, only if it is needed or in special cases. Even in these cases it is worthwhile to keep it simple. On that note, in this blogpost let me demonstrate how to clone dictionary row entries for objects from import sources to object properties in Icinga Director.

Splunk vs Prometheus: a Side-by-Side Comparison [2024 Guide]

When it comes to monitoring and observability, Splunk and Prometheus are two prominent tools with distinct strengths. Splunk excels in enterprise-level security and observability, while Prometheus is known for its efficient handling of time-series data. In this blog, I have compared these two tools, focusing on their unique features, and strengths. Remember, some insights may reflect personal preferences, helping you find the best fit for your specific monitoring needs.

Day-0, Day-1, and Day-2 Operations: What Are the Differences?

Operations are the backbone of successful software delivery, but the specifics of each phase—Day-0, Day-1, and Day-2—often get overlooked. Understanding these phases can help you streamline deployments, reduce risks, and maintain robust, scalable systems. Let’s break down what each phase entails and explore their distinct activities, tools, and best practices.

How to verify, document, and prove compliance with Gremlin

Resilient and reliable IT systems have become a minimum requirement for modern businesses—a fact driven home by any number of high-profile outages over the past few years. Unfortunately, when those outages are in the financial sector, it can have far-reaching and incredibly damaging results.