← All one-liners·#012·analysis·self contained·power

JSON from curl → DuckDB analysis via /dev/stdin

Pipe Reddit / HN / GitHub JSON straight into DuckDB's read_json_auto — SQL aggregations on live API data, no temp files.

Setup

→ brew install duckdb

Cost per run

free

The one-liner

$ curl -s "https://hn.algolia.com/api/v1/search?query=duckdb&hitsPerPage=100" \
  | jq '.hits' \
  | duckdb -markdown -c "
      SELECT author,
             COUNT(*)        AS posts,
             SUM(points)     AS total_score,
             ROUND(AVG(points), 1) AS avg_score
      FROM read_json_auto('/dev/stdin')
      WHERE points IS NOT NULL
      GROUP BY 1
      ORDER BY total_score DESC
      LIMIT 10
    "

What each stage does

[01] curlcurl -s "https://hn.algolia.com/api/v1/search?query=duckdb&hitsPerPage=100"
100 hits from HN's Algolia index.
[02] jqjq '.hits'
Strip the envelope — DuckDB's read_json_auto wants a flat array, not the {hits:[...], page:0, ...} wrapper.
[03] duckdbduckdb -markdown -c "…"
-markdown formats the result as a GitHub-style markdown table — perfect for pasting into a doc or piping further.
[04] duckdbFROM read_json_auto('/dev/stdin')
Read JSON from stdin like a table. Schema is inferred from the first ~16K rows. Works because DuckDB treats stdin as a regular file path on Unix.

Expected output (sample)

| author      | posts | total_score | avg_score |
|-------------|-------|-------------|-----------|
| jhanschoo   |    7  |        842  |    120.3  |
| simonw      |    5  |        618  |    123.6  |
| ...         |  ...  |        ...  |     ...   |

Caveats & tips

If stdin is small (<100 rows), DuckDB might infer types wrong — add explicit `columns := {col: 'INTEGER', …}` to read_json_auto.
For Reddit JSON (deeply nested), use `jq '.data.children | map(.data)'` to flatten first.

← #011

DuckDB: SQL over a CSV in one shot

#013 →

DuckDB: query a remote Parquet over HTTPS