DuckDB: SQL over a CSV in one shot
DuckDB reads CSV / JSON / Parquet directly — no schema, no load step, no server. The fastest way to put SQL into a pipeline.
Setup
- → brew install duckdb
Cost per run
free
The one-liner
$ duckdb -c "
SELECT category,
AVG(price) AS avg_price,
QUANTILE_CONT(price, 0.9) AS p90,
COUNT(*) AS n
FROM 'sales.csv'
GROUP BY 1
ORDER BY n DESC
"What each stage does
- [01] duckdb
duckdb -c "…"Run one SQL statement against an in-memory database. No server, no .duckdb file needed when you don't pass one. - [02] duckdb
FROM 'sales.csv'DuckDB's killer feature: any string in single-quotes that ends in .csv / .json / .parquet is treated as a table. Schema is inferred. - [03] duckdb
QUANTILE_CONT(price, 0.9)Real SQL — proper percentiles, window functions, joins, full PostgreSQL-compatible syntax. Skip the manual stats math.
Expected output (sample)
┌────────────┬───────────┬────────┬─────┐ │ category │ avg_price │ p90 │ n │ ├────────────┼───────────┼────────┼─────┤ │ electronics│ 142.40 │ 280.00 │ 412 │ │ apparel │ 52.10 │ 110.00 │ 318 │ │ books │ 18.20 │ 29.95 │ 287 │ └────────────┴───────────┴────────┴─────┘
Caveats & tips
- Same query works on .json and .parquet — DuckDB auto-detects.
- For one-shot scripts, output format flags: `-csv`, `-json`, `-markdown`, `-list`, `-line`.