← All one-liners·#012·analysis·self contained·power

JSON from curl → DuckDB analysis via /dev/stdin

Pipe Reddit / HN / GitHub JSON straight into DuckDB's read_json_auto — SQL aggregations on live API data, no temp files.

Setup
  • → brew install duckdb
Cost per run
free
The one-liner
$ curl -s "https://hn.algolia.com/api/v1/search?query=duckdb&hitsPerPage=100" \
  | jq '.hits' \
  | duckdb -markdown -c "
      SELECT author,
             COUNT(*)        AS posts,
             SUM(points)     AS total_score,
             ROUND(AVG(points), 1) AS avg_score
      FROM read_json_auto('/dev/stdin')
      WHERE points IS NOT NULL
      GROUP BY 1
      ORDER BY total_score DESC
      LIMIT 10
    "
What each stage does
  1. [01] curlcurl -s "https://hn.algolia.com/api/v1/search?query=duckdb&hitsPerPage=100"
    100 hits from HN's Algolia index.
  2. [02] jqjq '.hits'
    Strip the envelope — DuckDB's read_json_auto wants a flat array, not the {hits:[...], page:0, ...} wrapper.
  3. [03] duckdbduckdb -markdown -c "…"
    -markdown formats the result as a GitHub-style markdown table — perfect for pasting into a doc or piping further.
  4. [04] duckdbFROM read_json_auto('/dev/stdin')
    Read JSON from stdin like a table. Schema is inferred from the first ~16K rows. Works because DuckDB treats stdin as a regular file path on Unix.
Expected output (sample)
| author      | posts | total_score | avg_score |
|-------------|-------|-------------|-----------|
| jhanschoo   |    7  |        842  |    120.3  |
| simonw      |    5  |        618  |    123.6  |
| ...         |  ...  |        ...  |     ...   |
Caveats & tips
  • If stdin is small (<100 rows), DuckDB might infer types wrong — add explicit `columns := {col: 'INTEGER', …}` to read_json_auto.
  • For Reddit JSON (deeply nested), use `jq '.data.children | map(.data)'` to flatten first.