A look at what Apache Arrow is, how it works, and some of the companies using it as a critical component in their architecture. Over the past few decades, leveraging big datasets required businesses to perform increasingly complex analysis. Advancements in query performance, analytics, and data storage are largely a result of greater access to memory. Demand, manufacturing process improvements, and technological advances all contributed to cheaper memory.
This article was originally published in The New Stack and is reposted here with permission. Arrow makes analytics workloads more efficient for modern CPU and GPU hardware, which makes working with large data sets easier and less costly. One of the biggest challenges of working with big data is the performance overhead involved with moving data between different tools and systems as part of your data processing pipeline.
If you are working with large amounts of data that will primarily be used for analytics, a column database might be a good option. There are a lot of different options when it comes to choosing a database for your application. A common discussion seems to be the high-level SQL vs. NoSQL database argument of whether data should be stored in a relational database or in a NoSQL alternative like key-value, document or graph databases.
With native SQL support coming to InfluxDB, we can broaden the scope of developer tools used to analyze and visualize our time series data. One of these tools is Apache Superset. So let’s break down the basics of what Superset is, look at its features and benefits, and run a quick demo of Superset in action.