JSON from curl → DuckDB analysis via /dev/stdin
Pipe Reddit / HN / GitHub JSON straight into DuckDB's read_json_auto — SQL aggregations on live API data, no temp files.
Setup
- → brew install duckdb
Cost per run
free
The one-liner
$ curl -s "https://hn.algolia.com/api/v1/search?query=duckdb&hitsPerPage=100" \
| jq '.hits' \
| duckdb -markdown -c "
SELECT author,
COUNT(*) AS posts,
SUM(points) AS total_score,
ROUND(AVG(points), 1) AS avg_score
FROM read_json_auto('/dev/stdin')
WHERE points IS NOT NULL
GROUP BY 1
ORDER BY total_score DESC
LIMIT 10
"What each stage does
- [01] curl
curl -s "https://hn.algolia.com/api/v1/search?query=duckdb&hitsPerPage=100"100 hits from HN's Algolia index. - [02] jq
jq '.hits'Strip the envelope — DuckDB's read_json_auto wants a flat array, not the {hits:[...], page:0, ...} wrapper. - [03] duckdb
duckdb -markdown -c "…"-markdown formats the result as a GitHub-style markdown table — perfect for pasting into a doc or piping further. - [04] duckdb
FROM read_json_auto('/dev/stdin')Read JSON from stdin like a table. Schema is inferred from the first ~16K rows. Works because DuckDB treats stdin as a regular file path on Unix.
Expected output (sample)
| author | posts | total_score | avg_score | |-------------|-------|-------------|-----------| | jhanschoo | 7 | 842 | 120.3 | | simonw | 5 | 618 | 123.6 | | ... | ... | ... | ... |
Caveats & tips
- If stdin is small (<100 rows), DuckDB might infer types wrong — add explicit `columns := {col: 'INTEGER', …}` to read_json_auto.
- For Reddit JSON (deeply nested), use `jq '.data.children | map(.data)'` to flatten first.