RDS audit (CPU + connections, 7d) → claude downsize plan
Pulls every RDS instance with its class and storage, batches 7-day average CPU and DatabaseConnections via a single `cloudwatch get-metric-data` call, and asks claude to flag idle/oversized DBs with a concrete next-size-down recommendation.
Setup
- → brew install awscli jq
- → aws configure
- → IAM: rds:DescribeDBInstances, cloudwatch:GetMetricData
- → claude /login OR export ANTHROPIC_API_KEY=sk-…
Cost per run
<$0.01
The one-liner
$ REGION=${AWS_REGION:-eu-central-1}; END=$(date -u +%Y-%m-%dT%H:%M:%SZ); START=$(date -u -v-7d +%Y-%m-%dT%H:%M:%SZ 2>/dev/null || date -u -d '-7 days' +%Y-%m-%dT%H:%M:%SZ); INSTANCES=$(aws rds describe-db-instances --region $REGION --query 'DBInstances[?DBInstanceStatus==`available`].{id:DBInstanceIdentifier,class:DBInstanceClass,engine:Engine,storage:AllocatedStorage,multiaz:MultiAZ}' --output json); QUERIES=$(echo "$INSTANCES" | jq -c '[ .[] | .id as $i | ({Id: ("cpu_" + ($i|gsub("-";"_"))), MetricStat: {Metric: {Namespace: "AWS/RDS", MetricName: "CPUUtilization", Dimensions: [{Name:"DBInstanceIdentifier",Value:$i}]}, Period: 3600, Stat: "Average"}}, {Id: ("con_" + ($i|gsub("-";"_"))), MetricStat: {Metric: {Namespace: "AWS/RDS", MetricName: "DatabaseConnections", Dimensions: [{Name:"DBInstanceIdentifier",Value:$i}]}, Period: 3600, Stat: "Average"}}) ]'); METRICS=$(aws cloudwatch get-metric-data --region $REGION --start-time $START --end-time $END --metric-data-queries "$QUERIES" --output json); echo "$INSTANCES" | jq --argjson m "$METRICS" -r '. as $inst | $m.MetricDataResults | (map({(.Id): (.Values | add / (.Values|length // 1))}) | add) as $avg | $inst[] | [.id, .class, .engine, .storage, (($avg["cpu_" + (.id|gsub("-";"_"))] // 0) * 100 | floor / 100), (($avg["con_" + (.id|gsub("-";"_"))] // 0) | floor), .multiaz] | @tsv' | sort -k5 -n | claude -p "You are a FinOps DBA. Columns: db_id, class, engine, storage_gib, cpu_avg_pct_7d, connections_avg_7d, multi_az. Flag any instance with cpu<20%% AND connections<5 as IDLE; cpu<40%% as OVERSIZED. For each flagged row recommend the next-smaller class in the same family (e.g. db.r6g.2xlarge → db.r6g.xlarge halves cost). End with a ranked table of monthly savings. Be conservative on Multi-AZ prod DBs."What each stage does
- [01] aws
INSTANCES=$(aws rds describe-db-instances --query 'DBInstances[?DBInstanceStatus…JMESPath filter drops instances mid-failover/backing-up so the CloudWatch batch doesn't waste slots on transients. - [02] jq
QUERIES=$(echo "$INSTANCES" | jq -c '[ .[] | .id as $i | ({Id: ..., MetricStat: …Builds the `--metric-data-queries` array entirely in jq. Two metrics per instance, hyphens replaced with underscores because GetMetricData Ids must match `^[a-z][a-zA-Z0-9_]*$`. - [03] aws
aws cloudwatch get-metric-data --start-time $START --end-time $END --metric-data…ONE API call returns 7d of CPU + connections for up to 250 RDS instances (500-metric limit ÷ 2 metrics each). Cheaper and faster than per-instance get-metric-statistics loops. - [04] jq
jq --argjson m "$METRICS" -r '... $m.MetricDataResults | (map({(.Id): (.Values|a…Folds the metric results into an `{id: avg}` lookup, joins it back onto each instance, flattens to TSV with both metrics on one row — the layout claude needs for its IDLE/OVERSIZED rules. - [05] claude
claude -p "You are a FinOps DBA. ... cpu<20%% AND connections<5 as IDLE; cpu<40%…Explicit numeric thresholds prevent vague answers; the Multi-AZ caveat stops claude from confidently recommending you halve a prod replica.
Expected output (sample)
analytics-replica-2 db.r6g.2xlarge postgres 500 3.1 2 false reporting-pg db.m6i.xlarge postgres 200 8.4 4 false prod-orders db.m6i.2xlarge mysql 1000 34.2 180 true # claude memo: # IDLE analytics-replica-2 → db.r6g.large (4x downsize, ~$420/mo saved) # IDLE reporting-pg → db.m6i.large (2x downsize, ~$130/mo saved) # OVERSIZED prod-orders → db.m6i.xlarge (halve, ~$280/mo) — Multi-AZ, do in maintenance window # Total: ~$830/mo, ~$9,960/yr
Caveats & tips
- `date -u -v-7d` is BSD/macOS; `date -u -d '-7 days'` is GNU/Linux — the `2>/dev/null || …` fallback covers both.
- CloudWatch `get-metric-data` allows 500 metrics and 100,800 data points per request — this recipe sends 2 metrics × 168 hourly points per instance, so the practical cap is ~250 RDS instances before you must page.
- Dimension is `DBInstanceIdentifier` (not `DBInstanceId`) — using the wrong one returns empty results without erroring.
- Aurora cluster CPU lives under `DBClusterIdentifier`; this recipe targets standard RDS instances. Add a separate query block for clusters.
- Connection counts can be artificially low if your app uses a connection pool (pgbouncer/RDS Proxy) — review pool stats before downsizing.