← All one-liners·#011·analysis·self contained·beginner

DuckDB: SQL over a CSV in one shot

DuckDB reads CSV / JSON / Parquet directly — no schema, no load step, no server. The fastest way to put SQL into a pipeline.

Setup

→ brew install duckdb

Cost per run

free

The one-liner

$ duckdb -c "
  SELECT category,
         AVG(price)             AS avg_price,
         QUANTILE_CONT(price, 0.9) AS p90,
         COUNT(*)               AS n
  FROM 'sales.csv'
  GROUP BY 1
  ORDER BY n DESC
"

What each stage does

[01] duckdbduckdb -c "…"
Run one SQL statement against an in-memory database. No server, no .duckdb file needed when you don't pass one.
[02] duckdbFROM 'sales.csv'
DuckDB's killer feature: any string in single-quotes that ends in .csv / .json / .parquet is treated as a table. Schema is inferred.
[03] duckdbQUANTILE_CONT(price, 0.9)
Real SQL — proper percentiles, window functions, joins, full PostgreSQL-compatible syntax. Skip the manual stats math.

Expected output (sample)

┌────────────┬───────────┬────────┬─────┐
│  category  │ avg_price │  p90   │  n  │
├────────────┼───────────┼────────┼─────┤
│ electronics│   142.40  │ 280.00 │ 412 │
│ apparel    │    52.10  │ 110.00 │ 318 │
│ books      │    18.20  │  29.95 │ 287 │
└────────────┴───────────┴────────┴─────┘

Caveats & tips

Same query works on .json and .parquet — DuckDB auto-detects.
For one-shot scripts, output format flags: `-csv`, `-json`, `-markdown`, `-list`, `-line`.

← #010

CSV → instant stats with Miller

#012 →

JSON from curl → DuckDB analysis via /dev/stdin