Operations | Monitoring | ITSM | DevOps | Cloud

Analytics

Trust, understanding, and love

As Charles highlights in his Financial Services Predictions blog, operational resilience is critical. The regulatory drive for defining, measuring and improving operational resilience is clear within Europe, outlined by EU DORA and UK FCA / PRA guidelines. The organisations who embrace this change can capitalise on real opportunities in the coming years; specifically, the opportunity to use data-driven insights to improve customer experience and proactively resolve issues before customer impact.

InfluxDB, Flight SQL, Pandas, and Jupyter Notebooks Tutorial

InfluxDB Cloud, powered by IOx, is a versatile time series database built on top of the Apache ecosystem. You can query InfluxDB Cloud with the Apache Arrow Flight SQL interface, which provides SQL support for working with time series data. In this tutorial, we will walk through the process of querying InfluxDB Cloud with Flight SQL, using Pandas and Jupyter Notebooks to explore and analyze the resulting data, and creating interactive plots and visualizations.

Upgrade Your IoT/OT Tech Stack: Replace Legacy Data Historians with InfluxDB

Manufacturing and industrial organizations are firmly in the era of Industry 4.0. The third wave of industrial revolution, which saw the introduction of computers, robots, and automation in industrial processes, has given way to instrumentation, and the use of advanced technologies, like machine learning (ML) and artificial intelligence (AI), using both raw and trained data, to enhance industrial processes.

Datadog on Data Engineering Pipelines: Apache Spark at Scale

Datadog is an observability and security platform that ingests and processes tens of trillions of data points per day, coming from more than 22,000 customers. Processing that amount of data in a reasonable time stretches the limits of well known data engines like Apache Spark. In addition to scale, Datadog infrastructure is multi-cloud on Kubernetes and the data engineering platform is used by different engineering teams, so having a good set of abstractions to make running Spark jobs easier is critical.

Compactor: A Hidden Engine of Database Performance

This article was originally published in InfoWorld and is reposted here with permission. The compactor handles critical post-ingestion and pre-query workloads in the background on a separate server, enabling low latency for data ingestion and high performance for queries. The demand for high volumes of data has increased the need for databases that can handle both data ingestion and querying with the lowest possible latency (aka high performance).

Streaming conversion of Apache Kafka topics from JSON to Avro with Apache Flink

Pushing data in JSON format to an Apache Kafka topic is very common. However, dealing with messages not having a predefined structure can create some problems, specifically when trying to sink the data via connectors, like the JDBC sink, which require the knowledge of the message structure. Transforming the messages from JSON to AVRO can enforce a schema on messages and allow the usage of a bigger variety of connectors.

Data Denormalization: Pros, Cons & Techniques for Denormalizing Data

The amount of data organizations handle has created the need for faster data access and processing. Data Denormalization is a widely used technique to improve database query performance. This article discusses data normalization, its importance, how it differs from data normalization and denormalization techniques. Importantly, I’ll also look at the pros and cons of this approach.

How Geometric Search Works for Hexagons in Elasticsearch

Geographic grid systems allow zooming into maps at progressively higher resolutions and finer grids. For rectangular grids, this is very simple, but for hexagonal grids, the situation is much more complex, since child hexagons are not fully contained within parent hexagons. This video demonstrates how we can still achieve efficient parent-child search in Elasticsearch using the H3 hexagonal grid.

Data lake vs. data mesh: Which one is right for you?

What’s the right way to manage growing volumes of enterprise data, while providing the consistency, data quality and governance required for analytics at scale? Is centralizing data management in a data lake the right approach? Or is a distributed data mesh architecture right for your organization? When it comes down to it, most organizations seeking these solutions are looking for a way to analyze data without having to move or transform it via complex extract, transform and load (ETL) pipelines.