Production-grade reasoning that doesn't stall — --fallback-model
Run Opus at max effort, but auto-failover to Sonnet when Opus is overloaded. The flag that keeps cron jobs alive.
Setup
- → claude /login OR export ANTHROPIC_API_KEY=sk-…
Cost per run
$0.01-0.10
The one-liner
$ curl -s "https://hacker-news.firebaseio.com/v0/topstories.json" \
| jq -r '.[0:30][]' \
| xargs -P 10 -I {} curl -s "https://hacker-news.firebaseio.com/v0/item/{}.json" \
| jq -s -r 'sort_by(-.score) | .[0:10] | .[] | "- [\(.score)] \(.title) — \(.url // "discussion")"' \
| claude -p \
--model opus \
--effort high \
--fallback-model sonnet \
"Today's top HN stories. Identify the one quietly important but under-discussed. 5 sentences for an outsider."What each stage does
- [01] curl
curl … topstories.json …Same HN front-page fetch as recipe #1. - [02] claude
claude -pHeadless print mode. Required for --max-budget-usd, --fallback-model, --no-session-persistence. - [03] claude
--model opusPick the strongest model — opus, sonnet, haiku alias, or full ID like `claude-sonnet-4-6`. - [04] claude
--effort highReasoning effort: low / medium / high / xhigh / max. High = real extended thinking. Latency rises with effort. - [05] claude
--fallback-model sonnetWhen opus returns 529 (overloaded), claude automatically retries with sonnet. Production-grade: the pipe never stalls. Only works with -p.
Expected output (sample)
The under-discussed story is "The hidden cost of LLM batching" (score 388). While the M5 Pro benchmarks and the Rust vector DB get the upvotes, this one answers a question every team running production LLMs hits within their first month: how aggressive batching changes the latency-vs-cost tradeoff in a way that breaks SLA monitoring...
Caveats & tips
- --fallback-model only fires on overload (5xx), not on errors like rate limits.
- Test it by setting `--model haiku --fallback-model nonsense-model` — you should still get a haiku response, since haiku never overloads.