Friday, April 24, 2026

DuckDB: in-process SQL

duckdb/duckdb: DuckDB is an analytical in-process SQL database management system @GitHub
C++, MIT license

DuckDB is a high-performance analytical database system. It is designed to be fast, reliable, portable, and easy to use. DuckDB provides a rich SQL dialect with support far beyond basic SQL. DuckDB supports arbitrary and nested correlated subqueries, window functions, collations, complex types (arrays, structs, maps), and several extensions designed to make SQL easier to use.

DuckDB is available as a standalone CLI application and has clients for Python, R, Java, Wasm, etc., with deep integrations with packages such as pandas and dplyr.

For more information on using DuckDB, please refer to the DuckDB documentation.

DuckDB — Embedded Analytical SQL Engine

What It Is

DuckDB is an embedded, in-process analytical database — "SQLite for analytics." No separate server process, no external dependencies. Runs inside the host application.

Key difference from SQLite: columnar-vectorized query execution (processes batches of values) optimized for OLAP (analytical workloads), not OLTP (transactional).

Official Language Bindings

Language	Status	Notes
C/C++	Official	Native API, primary interface
Python	Official	Most popular binding
Node.js/JS	Official	npm package
Java	Official	JDBC driver
Go	Official	`duckdb-go-bindings` repo
Rust	Official	`duckdb-rs` crate
R	Official
WASM	Official	`@duckdb/duckdb-wasm` — full DuckDB in browser

All maintained by the DuckDB Foundation.

Not Officially Supported (DS3 codegen targets)

C# — no official binding (community wrappers may exist, or use C API via P/Invoke)
Swift — no official binding (use C API via bridging header)
Kotlin — can use Java JDBC driver
Dart — no official binding (use C API via FFI)

Deployment Modes

1. Embedded / In-Process (primary)

Runs inside the host application. No server, no IPC overhead. The generated program links against DuckDB and queries locally.

import duckdb
conn = duckdb.connect()
conn.sql("SELECT * FROM read_csv_auto('data.csv') WHERE age > 30")

2. HTTP Server (experimental)

Community httpserver extension — not first-class:

SELECT httpserve_start('0.0.0.0', 9999, 'user:pass');

Includes embedded web UI. Experimental/unstable.

3. Stdio / Unix Pipeline Tool

Works as a Unix command-line tool for piping:

cat data.csv | duckdb :memory: "SELECT * FROM read_csv_auto('/dev/stdin')"
duckdb :memory: "COPY (SELECT ...) TO '/dev/stdout' WITH (FORMAT csv, HEADER)"

Not designed as a persistent stdio server — no request/response loop like UCI or MCP. It's query-execute-exit, not a protocol.

4. WASM (Browser)

Full DuckDB compiled to WebAssembly via @duckdb/duckdb-wasm:

Runs in Chrome, Firefox, Safari, and Node.js
Reads Parquet, CSV, JSON from Filesystem APIs or HTTP
Speaks Apache Arrow natively
Client-side analytics without sending data to a server
Live demo: shell.duckdb.org

Native File Format Support

DuckDB can query these directly (no import step):

CSV — read_csv_auto('file.csv')
Parquet — read_parquet('file.parquet') — columnar, compressed, very fast
JSON — read_json_auto('file.json')
Excel — via spatial extension
HTTP/S3 — remote files via httpfs extension

This is exactly why DuckDB exploded in popularity — people need to query flat files with SQL.

DuckDB vs AWS Athena

AWS Athena uses Trino (formerly PrestoSQL) under the hood — not DuckDB.

Athena v1 was built on Presto (Facebook's distributed SQL engine). Athena v2 migrated to Trino (the open-source fork after the Presto community split). Athena v3 (2023+) added Apache Spark integration for non-SQL workloads.

They solve the same problem (SQL over files) at vastly different scales:

Aspect	Athena (Trino)	DuckDB
Architecture	Distributed, multi-node	Single-process, embedded
Scale	Petabytes across S3	Gigabytes, local
Pricing	Pay per query (data scanned)	Free, open source
Setup	Managed service	Zero setup
File support	Parquet, CSV, JSON, ORC, Avro on S3	Same formats, local or HTTP
Latency	Seconds (cluster spin-up)	Milliseconds

Athena is the "enterprise cloud" answer; DuckDB is the "just query this file on my laptop" answer.

MotherDuck is the cloud/managed version of DuckDB — "DuckDB as a service," closer to what Athena does but with DuckDB's engine. Supports hybrid local+cloud queries.

DraganSr