Lambda: tail recent errors → claude clusters them

One hour of Lambda errors, clustered by root cause via claude. Faster than reading CloudWatch by hand.

Setup

→ brew install awscli
→ aws configure
→ claude /login OR export ANTHROPIC_API_KEY=sk-…

Cost per run

<$0.01

The one-liner

$ aws logs tail /aws/lambda/my-function \
     --since 1h --filter-pattern '?ERROR ?Exception ?Traceback' \
     --format short \
  | claude -p \
      --append-system-prompt "You are an SRE. Be specific. Prefer fix-it-this-week over architectural rewrites." \
      "Cluster these Lambda errors by root cause. For each cluster: count, one-line cause, one-line fix. Markdown table."

What each stage does

[01] awsaws logs tail /aws/lambda/my-function --since 1h
Streams the last hour of CloudWatch logs from the function's log group. --since accepts 1h / 30m / 1d / a timestamp.
[02] aws--filter-pattern '?ERROR ?Exception ?Traceback'
CloudWatch filter syntax: `?word` means 'matches word'. Space-separated is OR. Filters server-side, not in your terminal — much cheaper than grep on a firehose.
[03] aws--format short
Strips the noisy timestamp/stream prefix. Just the message text. Ideal for piping to an LLM that doesn't need the metadata.
[04] claudeclaude -p --append-system-prompt "You are an SRE. …"
Persona via system prompt keeps the user prompt focused on the data. The 'fix-it-this-week' framing prevents claude from suggesting a microservices rewrite.

Expected output (sample)

| Count | Root cause | Fix |
|-------|------------|-----|
| 142 | DynamoDB ProvisionedThroughputExceededException | Switch to on-demand or raise WCU |
| 38 | JSON.parse on truncated SQS body | Check SNS→SQS subscription's RawMessageDelivery setting |
| 7 | Lambda timeout at 30s on cold start | Raise timeout to 60s or move to provisioned concurrency |

Caveats & tips

If the log volume is huge, narrow with `--since 15m` first — claude has a context limit.
Swap `claude` for `gemini -m gemini-3.1-pro-preview -p "…"` if you have free-tier Gemini credits and prefer to spend those.

← #030

SQS: replay DLQ messages back to source

#032 →

CloudWatch Logs Insights from the CLI