← All one-liners·#024·synthesis·llm cli·power

Production-grade reasoning that doesn't stall — --fallback-model

Run Opus at max effort, but auto-failover to Sonnet when Opus is overloaded. The flag that keeps cron jobs alive.

Setup
  • → claude /login OR export ANTHROPIC_API_KEY=sk-…
Cost per run
$0.01-0.10
The one-liner
$ curl -s "https://hacker-news.firebaseio.com/v0/topstories.json" \
  | jq -r '.[0:30][]' \
  | xargs -P 10 -I {} curl -s "https://hacker-news.firebaseio.com/v0/item/{}.json" \
  | jq -s -r 'sort_by(-.score) | .[0:10] | .[] | "- [\(.score)] \(.title) — \(.url // "discussion")"' \
  | claude -p \
      --model opus \
      --effort high \
      --fallback-model sonnet \
      "Today's top HN stories. Identify the one quietly important but under-discussed. 5 sentences for an outsider."
What each stage does
  1. [01] curlcurl … topstories.json …
    Same HN front-page fetch as recipe #1.
  2. [02] claudeclaude -p
    Headless print mode. Required for --max-budget-usd, --fallback-model, --no-session-persistence.
  3. [03] claude--model opus
    Pick the strongest model — opus, sonnet, haiku alias, or full ID like `claude-sonnet-4-6`.
  4. [04] claude--effort high
    Reasoning effort: low / medium / high / xhigh / max. High = real extended thinking. Latency rises with effort.
  5. [05] claude--fallback-model sonnet
    When opus returns 529 (overloaded), claude automatically retries with sonnet. Production-grade: the pipe never stalls. Only works with -p.
Expected output (sample)
The under-discussed story is "The hidden cost of LLM batching" (score 388). While the M5 Pro benchmarks and the Rust vector DB get the upvotes, this one answers a question every team running production LLMs hits within their first month: how aggressive batching changes the latency-vs-cost tradeoff in a way that breaks SLA monitoring...
Caveats & tips
  • --fallback-model only fires on overload (5xx), not on errors like rate limits.
  • Test it by setting `--model haiku --fallback-model nonsense-model` — you should still get a haiku response, since haiku never overloads.