← All one-liners·#011·analysis·self contained·beginner

DuckDB: SQL over a CSV in one shot

DuckDB reads CSV / JSON / Parquet directly — no schema, no load step, no server. The fastest way to put SQL into a pipeline.

Setup
  • → brew install duckdb
Cost per run
free
The one-liner
$ duckdb -c "
  SELECT category,
         AVG(price)             AS avg_price,
         QUANTILE_CONT(price, 0.9) AS p90,
         COUNT(*)               AS n
  FROM 'sales.csv'
  GROUP BY 1
  ORDER BY n DESC
"
What each stage does
  1. [01] duckdbduckdb -c "…"
    Run one SQL statement against an in-memory database. No server, no .duckdb file needed when you don't pass one.
  2. [02] duckdbFROM 'sales.csv'
    DuckDB's killer feature: any string in single-quotes that ends in .csv / .json / .parquet is treated as a table. Schema is inferred.
  3. [03] duckdbQUANTILE_CONT(price, 0.9)
    Real SQL — proper percentiles, window functions, joins, full PostgreSQL-compatible syntax. Skip the manual stats math.
Expected output (sample)
┌────────────┬───────────┬────────┬─────┐
│  category  │ avg_price │  p90   │  n  │
├────────────┼───────────┼────────┼─────┤
│ electronics│   142.40  │ 280.00 │ 412 │
│ apparel    │    52.10  │ 110.00 │ 318 │
│ books      │    18.20  │  29.95 │ 287 │
└────────────┴───────────┴────────┴─────┘
Caveats & tips
  • Same query works on .json and .parquet — DuckDB auto-detects.
  • For one-shot scripts, output format flags: `-csv`, `-json`, `-markdown`, `-list`, `-line`.