Wednesday, January 28, 2026

data: parquet js/ts libs

parquet-wasm - npm




from Claude.ai

JavaScript/TypeScript Parquet Libraries Comparison

Featureparquet-wasmhyparquet@dsnp/parquetjsparquetjs
Version0.7.11.23.31.8.70.11.2
Last Updated2 months ago1 week ago6 months ago>1 year ago
Dependencies00138
Bundle Size~1.2MB (WASM)~189KB~6.9MB~219KB
ImplementationRust/WASMPure JSPure JS/TSPure JS
TypeScript✅ Native✅ Native✅ Native❌ (types available)

Read/Write Support

parquet-wasmhyparquet@dsnp/parquetjsparquetjs
Read
Write✅ (separate pkg)
Streaming Read
Remote/HTTP✅ (range requests)
S3 Direct

Compression Codecs

parquet-wasmhyparquet@dsnp/parquetjsparquetjs
Uncompressed
Snappy
GZIP➕ (addon)
Brotli➕ (addon)
ZSTD➕ (addon)
LZ4➕ (addon)
LZO➕ (addon)

Environment Support

parquet-wasmhyparquet@dsnp/parquetjsparquetjs
Node.js
Browser
Edge/Workers⚠️ Limited

Performance Characteristics

LibraryBest For
parquet-wasmLarge files, high performance, Arrow integration
hyparquetBrowser-first, small bundle, low latency
@dsnp/parquetjsNode.js apps needing full read/write, S3 integration
parquetjsLegacy projects (unmaintained)

Recommendations

For high-performance/large files → parquet-wasm

  • Uses Rust's battle-tested parquet crate compiled to WASM
  • Best compression codec support (including ZSTD, LZ4)
  • Outputs Apache Arrow format (great for analytics)
  • Larger initial bundle, but fastest for big data

For browser-first/lightweight → hyparquet

  • Zero dependencies, tiny bundle
  • Great for viewing/reading parquet in browser
  • HTTP range request support (partial file reads)
  • Writing requires separate hyparquet-writer package

For Node.js full-featured → @dsnp/parquetjs

  • Most complete pure-JS implementation
  • Built-in S3 support
  • Actively maintained fork of original parquetjs
  • Good TypeScript support
  • ⚠️ Compression must be set per-column (as you discovered)

Avoid → parquetjs (original)

  • Unmaintained (>1 year without updates)
  • Use @dsnp/parquetjs instead

For your 268MB use case with compression needs, I'd suggest either parquet-wasm (best compression options, highest performance) or sticking with @dsnp/parquetjs but ensuring compression is set on each column in the schema.

dsnp/parquetjs compression

With 268MB and 100K row chunks, that should definitely compress. Something else is going on.

Let me check how the compression option actually gets applied:Found it. Let me check how the schema builds the column options:Now I see the issue clearly. The compression must be set on each column in the schema, NOT as a writer option. Let me verify there's no global compression option:This confirms it. @dsnp/parquetjs does NOT support a global compression option on the writer. You must specify compression on each column in the schema.

This won't work:

// ❌ This does nothing - compression option is ignored
const writer = await parquet.ParquetWriter.openFile(schema, 'output.parquet', {
  compression: 'GZIP'
});

This is how you must do it:

// ✅ Compression must be on each column
const schema = new parquet.ParquetSchema({
  id: { type: 'INT64', compression: 'GZIP' },
  name: { type: 'UTF8', compression: 'GZIP' },
  value: { type: 'DOUBLE', compression: 'GZIP' }
});

If you have many columns and want to apply compression to all of them programmatically:

const fields = {
  id: { type: 'INT64' },
  name: { type: 'UTF8' },
  value: { type: 'DOUBLE' }
};

// Add compression to all fields
for (const key of Object.keys(fields)) {
  fields[key].compression = 'GZIP';
}

const schema = new parquet.ParquetSchema(fields);

streaming / appending to parquet files

Let me check the parquet-wasm API for that capability.Based on my research, no, parquet-wasm does not support appending multiple row groups incrementally to a single file.

Looking at the API, parquet-wasm has a simple writeParquet(table, writerProperties) function that takes an entire Arrow Table and writes it out at once. There's no streaming writer or incremental row group API.

Here's the comparison for your use case:

LibraryIncremental Row Group Writing
@dsnp/parquetjs✅ Yes - ParquetWriter buffers rows and writes row groups automatically based on rowGroupSize
parquet-wasm❌ No - single-shot writeParquet() only
hyparquet-writer❌ No - single-shot API
PyArrow (Python)✅ Yes - ParquetWriter.write_table() can be called multiple times

Python vs TypeScript for Parquet

Yes, for Parquet writing, Python is significantly better. The JS ecosystem is pretty immature here.

PyArrow gives you:

  • Streaming writer with multiple row groups
  • Global compression settings (not per-column)
  • Append mode to existing files
  • All compression codecs (ZSTD, LZ4, etc.)
  • Battle-tested, production-grade
  • Much better performance
import pyarrow as pa
import pyarrow.parquet as pq

schema = pa.schema([
    ('id', pa.int64()),
    ('name', pa.string()),
])

with pq.ParquetWriter('output.parquet', schema, compression='zstd') as writer:
    for chunk in chunks:
        table = pa.Table.from_pydict(chunk, schema=schema)
        writer.write_table(table)  # Each call can create a new row group

That's it. Compression just works globally.


If you must stay in Node.js, your options are:

  1. Spawn Python - call a Python script from Node for the Parquet writing
  2. Use DuckDB - the duckdb npm package can write Parquet with proper compression
  3. Fix @dsnp/parquetjs - add compression to every column (tedious but works)

DuckDB in Node might actually be your best hybrid option:

import { Database } from 'duckdb-async';

const db = await Database.create(':memory:');
await db.run(`
  COPY (SELECT * FROM read_json_auto('data.json')) 
  TO 'output.parquet' (FORMAT PARQUET, COMPRESSION ZSTD)
`);

if Parquet I/O is a significant part of your workflow, Python is the right tool. The JS libraries are mostly community efforts while PyArrow is backed by the Arrow project itself.

From Gemini.ai

Quick Verdict

  • Best for Performance (Heavy Data): parquet-wasm (Fastest throughput, returns Arrow data).
  • Best for Web/Browser (Low Latency): hyparquet (Tiny size, fast startup, no WASM overhead).
  • Best for Analytics/SQL: duckdb-wasm (Full SQL engine, fast queries, heavy load).
  • Legacy/Node.js: parquetjs (Use the @dsnp fork).

Comparison Matrix

LibraryTypeSpeedBundle SizePopularitySupport
parquet-wasmRust (WASM)🚀 Excellent (Throughput)Heavy (~1.2MB / ~450KB read-only)Rising (Niche high-perf)✅ Active (Kyle Barron)
hyparquetPure JS⚡ Good (Startup/Latency)🪶 Tiny (~10KB min)Low but growing (~730 stars)✅ Active (Single maintainer)
duckdb-wasmSQL Engine🐢 Startup / 🚀 Query🐘 Huge (20MB+)🔥 High (~400k weekly)✅ Excellent (DuckDB Team)
apache-arrowJS / TS🔸 Moderate (Reading)📦 Large (~5MB unpacked)👑 Massive (~1M weekly)✅ Excellent (Apache)
parquetjsPure JS🐌 Slow🔸 Medium (~220KB)📉 Legacy (~370k monthly)❌ Abandoned (Use forks)

Detailed Breakdown

1. parquet-wasm

  • Best For: Heavy data processing, integration with Apache Arrow, and scenarios where throughput matters more than initial load time.
  • Pros:
    • Speed: Uses Rust's high-performance parquet crate compiled to WebAssembly. It is significantly faster than pure JS libraries for parsing large files.
    • Arrow Integration: Reads data directly into Apache Arrow tables (efficient, zero-copy-like architecture).
    • Features: Supports all compression codecs (Snappy, Gzip, Brotli, ZSTD, etc.) and writing capabilities.
  • Cons:
    • Size: The WASM bundle is heavy. Even the "read-only" build is ~450KB, which can hurt "Time to Interactive" on slower networks.
    • Complexity: Requires handling async WASM initialization and memory management.

2. hyparquet

  • Best For: Browser-based viewers, dashboards, and apps where fast startup (low latency) is critical.
  • Pros:
    • Size: Extremely lightweight (~10KB). It loads instantly compared to WASM libraries.
    • Async/Streaming: Optimized for HTTP Range requests. It can read metadata and specific chunks without downloading the whole file, making it feel "instant" for large remote files.
    • Simplicity: Pure JavaScript. No WASM compile steps, no complex bundler configurations, and easy to debug.
  • Cons:
    • Throughput: Slower than parquet-wasm if you need to parse every row of a massive file into memory at once.
    • Output: Returns standard JS objects (or arrays), which use more memory than Arrow buffers.

3. duckdb-wasm

  • Best For: Complex analytics where you need to run SQL queries (Group By, Join, Filter) on Parquet files client-side.
  • Pros:
    • Query Performance: Once loaded, it is incredibly fast. You can run complex SQL queries directly on the Parquet file without loading it all into JS memory.
    • Versatility: Handles CSV, JSON, and Arrow formats as well.
  • Cons:
    • Startup Cost: The "elephant" in the room. It downloads a massive WASM binary and data files. It is not suitable for quick "page load" interactions.
    • Overhead: Overkill if you just need to read a few rows or simple data.

4. parquetjs (and forks)

  • Best For: Legacy Node.js applications or simple scripts where WASM is difficult to deploy.
  • Status: The original ironSource/parquetjs is abandoned (6+ years old).
  • Recommendation: If you must use this API, use the maintained fork: @dsnp/parquetjs.
  • Pros:
    • Familiarity: Simple, row-based API (Iterators).
    • Pure JS: Easy to use in Node.js without WASM compatibility issues.
  • Cons:
    • Performance: Significantly slower than WASM-based solutions.
    • Maintenance: Even forks receive only sporadic updates compared to the Arrow/DuckDB ecosystem.

Summary Recommendation

  1. Use hyparquet if you are building a web UI and need the page to load fast. It's the modern, lightweight choice for browsers.
  2. Use parquet-wasm if you are building a heavy data tool (like a BI tool in the browser) and plan to use Apache Arrow for data manipulation.
  3. Use duckdb-wasm if you need SQL capabilities client-side.
  4. Use @dsnp/parquetjs only if you are in a Node.js environment where WASM is not an option and performance is not critical.

===

The relationship between parquet-wasm and apache-arrow is complementary.

They represent the two halves of a modern data pipeline: Storage vs. Memory.

You did not see apache-arrow in the detailed list of Parquet libraries because apache-arrow (the NPM package) cannot read Parquet files by itself.

Here is the specific relationship:

1. The Separation of Concerns

  • Parquet is a Storage Format (On Disk). It is highly compressed and encoded to save space on your hard drive or network.
  • Arrow is a Memory Format (In RAM). It is uncompressed and laid out specifically for the CPU to process extremely fast (Zero-Copy).

2. The Bridge (parquet-wasm)

Since the standard apache-arrow JavaScript library does not have a native Parquet parser, you need a "loader" to convert the compressed file into the Arrow memory format.

  • parquet-wasm acts as this bridge.
  • It takes the compressed Parquet bytes -> decompresses them in WebAssembly (Rust) -> and outputs Apache Arrow tables.

3. The Workflow

In a typical high-performance web application (2025), you use them together like this:

  1. Fetch: You download a file data.parquet.
  2. Decode (parquet-wasm): You pass the file to parquet-wasm. It decompresses the Snappy/Gzip data and organizes it into Arrow Buffers inside WASM memory.
  3. Analyze (apache-arrow): parquet-wasm hands back an Arrow Table. You then import the apache-arrow library to actually interact with that table (e.g., table.getChild('price')table.filter(...), or iterators).

Summary Table

LibraryRoleAnalogy
parquet-wasmThe Chef (Prepares the food)Takes raw ingredients (Parquet file), cooks them (decompresses/parses), and puts them on a plate (Arrow Table).
apache-arrowThe Cutlery (Consumes the food)The fork and knife you use to actually eat (read/manipulate) the data once it is on the plate.

Why parquet-wasm is the winner for speed: Because it outputs data directly into the Arrow format, parquet-wasm allows you to hand that data off to other tools (like DuckDB, Arquero, or WebGL visualization libraries like Deck.gl) with Zero-Copy. The data doesn't need to be converted into slow JavaScript objects (like hyparquet does); it stays as a binary blob that the computer reads instantly.

===

duckdb-wasm

1. Is duckdb-wasm Standalone?

Yes and No.

  • Yes, it is a standalone Engine: duckdb-wasm is a complete SQL database engine running in your browser. It does not rely on parquet-wasm or apache-arrow to function. It has its own internal C++ code (compiled to WASM) that handles Parquet parsing, query execution, and data management. You can drop it into a page, give it a Parquet URL, and run SQL without installing anything else.

  • No, it is part of the "Modern Stack": While it can work alone, it is designed to plug directly into the Apache Arrow ecosystem.

    • Input: It can read standard JS objects, but it is much faster if you feed it Apache Arrow tables.
    • Output: When you run a query (SELECT * FROM ...), the most efficient way to get results out is as an Apache Arrow table.

2. Relationship with other libraries

LibraryRelationship to duckdb-wasm
parquet-wasmCompetitor (sort of). Both libraries can read Parquet files.
• Use duckdb-wasm if you need to run SQL queries (filter, join, group).
• Use parquet-wasm if you just want to convert Parquet -> Arrow/JS as fast as possible without the overhead of a full SQL engine.
apache-arrowBest Friend. duckdb-wasm uses Arrow as its "data interchange" layer. You will almost always install the apache-arrow library alongside duckdb-wasm so you can actually read the results of your SQL queries efficiently.

3. License

It is extremely permissive.

  • License: MIT License
  • What this means: It is free for commercial use, modification, and distribution. You do not need to pay or open-source your own code to use it. This is the same license as React, Angular, and many other standard web tools.

Summary: The "Data Stack" Architecture

If you are building a serious data tool in 2025, your stack often looks like this:

  1. Storage: Parquet File (Server/S3)
  2. Engine: duckdb-wasm (Downloads file, runs SQL, outputs Arrow)
  3. Visualization: apache-arrow (Reads the Arrow output from DuckDB to render charts/tables)