very efficient and standard way to store data, in particular for analytics and "columnar" type of data
(with repeated values)
Parquet
Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval. It provides high performance compression and encoding schemes to handle complex data in bulk and is supported in many programming languages and analytics tools.
Apache Parquet is a free and open-source column-oriented data storage format in the Apache Hadoop ecosystem inspired by Google Dremel interactive ad-hoc query system for analysis of read-only nested data.[3] It is similar to RCFile and ORC, the other columnar-storage file formats in Hadoop, and is compatible with most of the data processing frameworks around Hadoop. It provides data compression and encoding schemes with enhanced performance to handle complex data in bulk.
Reading and Writing the Apache Parquet Format — Apache Arrow v22.0.0
The Apache Parquet project provides a standardized open-source columnar storage format for use in data analysis systems. It was created originally for use in Apache Hadoop with systems like Apache Drill, Apache Hive, Apache Impala, and Apache Spark adopting it as a shared standard for high performance data IO.Apache Arrow is an ideal in-memory transport layer for data that is being read or written with Parquet files. We have been concurrently developing the C++ implementation of Apache Parquet, which includes a native, multithreaded C++ adapter to and from in-memory Arrow data. PyArrow includes Python bindings to this code, which thus enables reading and writing Parquet files with pandas as well.
TypeScript/JavaScript
the lib is not large, but dependencies are?
"a fully asynchronous, pure JavaScript implementation of the Parquet file format. The implementation conforms with the Parquet specification and is tested for compatibility with Apache's Java reference implementation.
What is Parquet?: Parquet is a column-oriented file format; it allows you to write a large amount of structured data to a file, compress it and then read parts of it back out efficiently. The Parquet format is based on Google's Dremel paper.
duckdb - npm
