feat(rotation): add multi-account rotation for API rate limit handling (#81)#195
feat(rotation): add multi-account rotation for API rate limit handling (#81)#195Andymendez100 wants to merge 3 commits intofrankbria:mainfrom
Conversation
…g (Issue frankbria#81) When Claude hits the 5-hour API rate limit, Ralph can now rotate to the next configured account instead of waiting 60 minutes. Fully opt-in via ACCOUNT_ROTATION=true in .ralphrc. Core library (lib/account_rotation.sh): - Two switching mechanisms: ANTHROPIC_API_KEY swap and CLAUDE_CONFIG_DIR swap - 60-minute cooldown per rate-limited account with automatic recovery - Cross-env-var cleanup: unsets the inactive mechanism when switching types - Default ~/.claude dir handled by unsetting CLAUDE_CONFIG_DIR (keychain auth) - Defensive parent-dir creation in reset_account_rotation - API keys stored in ~/.ralph/accounts.conf (never in repo) Loop integration (ralph_loop.sh): - Exit code 2 handler tries rotation before falling back to wait/exit - Logs active account at each loop start when rotation is enabled - WARN log when ACCOUNT_ROTATION=true but init fails (no accounts configured) - --reset-accounts CLI flag to clear all rate-limit cooldowns Documentation: - README: step-by-step Quick Start guides for OAuth and API key accounts - CLAUDE.md: library docs, CLI flag, test counts (623 total) - templates/accounts.conf.example with chmod 600 security guidance Tests: 57 unit tests covering config loading, rotation order, cooldown tracking, cross-type transitions, wrap-around, env var switching, CLI flag, and edge cases.
WalkthroughAdds an opt-in multi-account rotation subsystem that loads configured Claude accounts, persists per-account cooldown state (.ralph/.account_rotation_state), switches ANTHROPIC_API_KEY or CLAUDE_CONFIG_DIR on 5-hour/API-limit signals, exposes a CLI reset, installs templates, and wires rotation into the main loop and tests. Changes
Sequence DiagramsequenceDiagram
participant User
participant RalphLoop as Ralph Loop
participant Rotation as Account Rotation
participant State as State File
participant Claude as Claude API
User->>RalphLoop: start (ACCOUNT_ROTATION=true)
RalphLoop->>Rotation: init_account_rotation()
Rotation->>State: load .ralph/.account_rotation_state
State-->>Rotation: return state / active account
Rotation-->>RalphLoop: init complete
loop each iteration
RalphLoop->>Claude: call with current credentials
alt Claude indicates rate-limit (exit code 2 / "hit your limit")
Claude-->>RalphLoop: rate-limited
RalphLoop->>Rotation: try_rotate_account()
Rotation->>State: find next available account (skip cooldown)
alt account available
Rotation->>State: mark prev account rate-limited (set cooldown)
Rotation->>Rotation: switch_account() (set ANTHROPIC_API_KEY or CLAUDE_CONFIG_DIR)
Rotation-->>RalphLoop: switched -> retry
RalphLoop->>Claude: retry with new credentials
else all exhausted
Rotation-->>RalphLoop: no accounts available
RalphLoop->>User: fallback wait/exit
end
else success
Claude-->>RalphLoop: success -> continue
end
end
User->>RalphLoop: ralph --reset-accounts
RalphLoop->>Rotation: reset_account_rotation()
Rotation->>State: clear cooldowns, reset active index
State-->>RalphLoop: reset complete
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (5)
ralph_loop.sh (2)
1923-1931: Redundant re-sourcing ofdate_utils.shandaccount_rotation.sh.Both libraries are already sourced at the top of the file (lines 15, 20). The existing
--reset-circuithandler on lines 1907-1913 follows the same redundant pattern, so this is consistent. Not a bug, just unnecessary.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ralph_loop.sh` around lines 1923 - 1931, The handler for the --reset-accounts case redundantly re-sources date_utils.sh and account_rotation.sh even though they are already sourced at the top; remove the duplicate sourcing (the SCRIPT_DIR/source lines) from the --reset-accounts block and keep only the call to reset_account_rotation "Manual reset via command line" and the success echo so the function reset_account_rotation remains used but libraries aren't re-sourced.
1630-1636: Consider logging the account type/index in a more user-friendly format.
get_active_accountreturns raw"0 key"or"2 config_dir"which is logged directly. For production users, a friendlier format like"Account#1(API key)"or"Account#3(config dir: ~/.claude-account2)"would be more actionable when troubleshooting rotation behavior.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ralph_loop.sh` around lines 1630 - 1636, The current log prints the raw output of get_active_account (e.g., "0 key") which is not user-friendly; update the rotation logging in the ACCOUNT_ROTATION branch to parse get_active_account's output (split into index and type token) and map the token to readable text (e.g., "key" -> "API key", "config_dir" -> "config dir: ~/.claude-accountN") then call log_status with a formatted string like "Account #<index+1> (<readable type/details>)". Locate this change around the ACCOUNT_ROTATION check where get_active_account and log_status are used and ensure the formatted message handles unknown tokens gracefully.lib/account_rotation.sh (3)
407-419:export -fis unnecessary for functions used only within the current shell and command substitutions.Bash command substitutions (
$(...)) already inherit function definitions from the parent shell.export -fis only needed for functions that must survive anexecorenv/bash -cboundary. Unless there's a downstream usage pattern not shown here, these exports add clutter and can cause issues with non-bash child processes that don't understand exported functions.That said, removing them is low risk / low priority — they're harmless in the current setup.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/account_rotation.sh` around lines 407 - 419, Remove the unnecessary function exports: delete the export -f lines for load_account_config, get_total_accounts, get_active_account, mark_account_rate_limited, get_next_available_account, switch_account, all_accounts_exhausted, reset_account_rotation, init_account_rotation, try_rotate_account and the private helpers _ensure_state_file and _is_account_in_cooldown; these functions are used within the current shell/subshells and do not require exported function environment inheritance, so simply remove those export -f statements and keep the plain function definitions intact.
81-86:_ensure_state_fileis called on nearly every public function entry — consider caching the validity check.
_ensure_state_filerunsjq '.'to validate JSON on every call toget_active_account,mark_account_rate_limited,_is_account_in_cooldown,get_next_available_account, andswitch_account. During a singletry_rotate_accountinvocation, this results in multiple redundantjqvalidations of the same file. In the normal (non-corrupt) case, this is pure overhead.A lightweight optimization: set a flag after the first successful validation within the same shell session and skip re-validation unless the file is rewritten.
Also applies to: 103-110, 114-117, 137-139, 163-164
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/account_rotation.sh` around lines 81 - 86, The repeated jq validation in _ensure_state_file for ACCOUNT_ROTATION_STATE_FILE causes redundant overhead because it runs on every entry of get_active_account, mark_account_rate_limited, _is_account_in_cooldown, get_next_available_account, and switch_account during a single try_rotate_account flow; fix by adding a shell-session cached flag (e.g., STATE_FILE_VALIDATED=true) set after the first successful jq '.' check in _ensure_state_file and skip subsequent jq calls if the flag is present, and ensure you clear/unset that flag whenever the state file is rewritten (places that modify the file such as functions that write or rm the state file) so future validations still occur after changes.
81-99: Orphan temp file onjqfailure in_ensure_state_file(and similar functions).If
jq -n ... > "$tmpfile"fails, the temp file created bymktempon line 90 is left behind because the&&preventsmvbut nothing removes$tmpfile. The same pattern appears inmark_account_rate_limited(line 126),switch_account(line 254), andreset_account_rotation(line 308).Consider adding a trap or explicit cleanup on failure:
♻️ Suggested pattern
if [[ ! -f "$ACCOUNT_ROTATION_STATE_FILE" ]]; then local tmpfile tmpfile=$(mktemp "${ACCOUNT_ROTATION_STATE_FILE}.XXXXXX") + # shellcheck disable=SC2064 + trap "rm -f '$tmpfile'" RETURN jq -n '{ active_index: 0, active_type: "none", rate_limited: {}, last_rotation: "", total_rotations: 0 }' > "$tmpfile" && mv "$tmpfile" "$ACCOUNT_ROTATION_STATE_FILE" + trap - RETURN fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/account_rotation.sh` around lines 81 - 99, The temp file created by mktemp in _ensure_state_file (and similarly in mark_account_rate_limited, switch_account, reset_account_rotation) can be left orphaned if the jq write fails because the code only moves the tmpfile on success; change the pattern to ensure cleanup on failure by either installing a local trap (e.g., trap 'rm -f "$tmpfile"; return 1' EXIT) before running jq and removing the trap after a successful mv, or by checking jq's exit status and explicitly rm -f "$tmpfile" and return a non-zero status when jq fails so the temporary file is never left behind.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/unit/test_account_rotation.bats`:
- Around line 701-712: Update the test so it verifies the state file was
actually cleared, not just the output: after calling run bash "$ralph_script"
--reset-accounts (in tests/unit/test_account_rotation.bats) read the state file
created by _ensure_state_file and assert it no longer contains the entries added
by mark_limited_now ("key:0" and "key:1") or is empty; keep the existing
run/assert_success/assert on output but add an assertion that inspects the state
file contents to confirm the reset mutated state (use the same state file
variable/helper the test suite uses to locate/read the state created by
_ensure_state_file).
- Around line 19-23: The setup creates TEST_HOME and sets ACCOUNT_KEYS_FILE but
never points HOME at TEST_HOME, so CLI calls may touch the real user dirs;
update the test setup to export HOME="$TEST_HOME" (alongside TEST_HOME and
ACCOUNT_KEYS_FILE) so all $HOME-based lookups and CLI invocations use the
temporary directory; ensure subsequent code that relies on ACCOUNT_KEYS_FILE and
any cleanup still works with HOME redirected.
---
Nitpick comments:
In `@lib/account_rotation.sh`:
- Around line 407-419: Remove the unnecessary function exports: delete the
export -f lines for load_account_config, get_total_accounts, get_active_account,
mark_account_rate_limited, get_next_available_account, switch_account,
all_accounts_exhausted, reset_account_rotation, init_account_rotation,
try_rotate_account and the private helpers _ensure_state_file and
_is_account_in_cooldown; these functions are used within the current
shell/subshells and do not require exported function environment inheritance, so
simply remove those export -f statements and keep the plain function definitions
intact.
- Around line 81-86: The repeated jq validation in _ensure_state_file for
ACCOUNT_ROTATION_STATE_FILE causes redundant overhead because it runs on every
entry of get_active_account, mark_account_rate_limited, _is_account_in_cooldown,
get_next_available_account, and switch_account during a single
try_rotate_account flow; fix by adding a shell-session cached flag (e.g.,
STATE_FILE_VALIDATED=true) set after the first successful jq '.' check in
_ensure_state_file and skip subsequent jq calls if the flag is present, and
ensure you clear/unset that flag whenever the state file is rewritten (places
that modify the file such as functions that write or rm the state file) so
future validations still occur after changes.
- Around line 81-99: The temp file created by mktemp in _ensure_state_file (and
similarly in mark_account_rate_limited, switch_account, reset_account_rotation)
can be left orphaned if the jq write fails because the code only moves the
tmpfile on success; change the pattern to ensure cleanup on failure by either
installing a local trap (e.g., trap 'rm -f "$tmpfile"; return 1' EXIT) before
running jq and removing the trap after a successful mv, or by checking jq's exit
status and explicitly rm -f "$tmpfile" and return a non-zero status when jq
fails so the temporary file is never left behind.
In `@ralph_loop.sh`:
- Around line 1923-1931: The handler for the --reset-accounts case redundantly
re-sources date_utils.sh and account_rotation.sh even though they are already
sourced at the top; remove the duplicate sourcing (the SCRIPT_DIR/source lines)
from the --reset-accounts block and keep only the call to reset_account_rotation
"Manual reset via command line" and the success echo so the function
reset_account_rotation remains used but libraries aren't re-sourced.
- Around line 1630-1636: The current log prints the raw output of
get_active_account (e.g., "0 key") which is not user-friendly; update the
rotation logging in the ACCOUNT_ROTATION branch to parse get_active_account's
output (split into index and type token) and map the token to readable text
(e.g., "key" -> "API key", "config_dir" -> "config dir: ~/.claude-accountN")
then call log_status with a formatted string like "Account #<index+1> (<readable
type/details>)". Locate this change around the ACCOUNT_ROTATION check where
get_active_account and log_status are used and ensure the formatted message
handles unknown tokens gracefully.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
.gitignoreCLAUDE.mdREADME.mdinstall.shlib/account_rotation.shralph_loop.shtemplates/.gitignoretemplates/accounts.conf.exampletests/unit/test_account_rotation.bats
| # Create a temp home for accounts.conf (avoid touching real ~/.ralph) | ||
| export TEST_HOME="$(mktemp -d /tmp/ralph-home.XXXXXX)" | ||
| mkdir -p "$TEST_HOME/.ralph" | ||
| export ACCOUNT_KEYS_FILE="$TEST_HOME/.ralph/accounts.conf" | ||
|
|
There was a problem hiding this comment.
Isolate HOME in setup to avoid touching real user state.
Line 20 creates TEST_HOME, but HOME is never pointed to it. Tests that rely on $HOME (and CLI invocations) can accidentally interact with real ~/.ralph/~/.claude on local runs.
Proposed fix
export TEST_HOME="$(mktemp -d /tmp/ralph-home.XXXXXX)"
mkdir -p "$TEST_HOME/.ralph"
+ export HOME="$TEST_HOME"
export ACCOUNT_KEYS_FILE="$TEST_HOME/.ralph/accounts.conf"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Create a temp home for accounts.conf (avoid touching real ~/.ralph) | |
| export TEST_HOME="$(mktemp -d /tmp/ralph-home.XXXXXX)" | |
| mkdir -p "$TEST_HOME/.ralph" | |
| export ACCOUNT_KEYS_FILE="$TEST_HOME/.ralph/accounts.conf" | |
| # Create a temp home for accounts.conf (avoid touching real ~/.ralph) | |
| export TEST_HOME="$(mktemp -d /tmp/ralph-home.XXXXXX)" | |
| mkdir -p "$TEST_HOME/.ralph" | |
| export HOME="$TEST_HOME" | |
| export ACCOUNT_KEYS_FILE="$TEST_HOME/.ralph/accounts.conf" |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/test_account_rotation.bats` around lines 19 - 23, The setup
creates TEST_HOME and sets ACCOUNT_KEYS_FILE but never points HOME at TEST_HOME,
so CLI calls may touch the real user dirs; update the test setup to export
HOME="$TEST_HOME" (alongside TEST_HOME and ACCOUNT_KEYS_FILE) so all $HOME-based
lookups and CLI invocations use the temporary directory; ensure subsequent code
that relies on ACCOUNT_KEYS_FILE and any cleanup still works with HOME
redirected.
| @test "--reset-accounts: resets account rotation state" { | ||
| local ralph_script="${BATS_TEST_DIRNAME}/../../ralph_loop.sh" | ||
|
|
||
| # Create state file with some rate-limited accounts | ||
| _ensure_state_file | ||
| mark_limited_now "key:0" | ||
| mark_limited_now "key:1" | ||
|
|
||
| run bash "$ralph_script" --reset-accounts | ||
| assert_success | ||
| [[ "$output" == *"reset successfully"* ]] | ||
| } |
There was a problem hiding this comment.
Validate state mutation, not just success text, for --reset-accounts.
Line 711 only checks output text. The test can pass even if reset messaging is printed but state is not actually cleared. Assert the state file contents after running the command.
Proposed fix
`@test` "--reset-accounts: resets account rotation state" {
local ralph_script="${BATS_TEST_DIRNAME}/../../ralph_loop.sh"
@@
run bash "$ralph_script" --reset-accounts
assert_success
[[ "$output" == *"reset successfully"* ]]
+
+ local limited_count active_index
+ limited_count=$(jq '.rate_limited | length' "$ACCOUNT_ROTATION_STATE_FILE")
+ active_index=$(jq -r '.active_index' "$ACCOUNT_ROTATION_STATE_FILE")
+ [[ "$limited_count" == "0" ]]
+ [[ "$active_index" == "0" ]]
}Based on learnings, "Test suite must achieve 100% test pass rate with comprehensive coverage..."
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @test "--reset-accounts: resets account rotation state" { | |
| local ralph_script="${BATS_TEST_DIRNAME}/../../ralph_loop.sh" | |
| # Create state file with some rate-limited accounts | |
| _ensure_state_file | |
| mark_limited_now "key:0" | |
| mark_limited_now "key:1" | |
| run bash "$ralph_script" --reset-accounts | |
| assert_success | |
| [[ "$output" == *"reset successfully"* ]] | |
| } | |
| `@test` "--reset-accounts: resets account rotation state" { | |
| local ralph_script="${BATS_TEST_DIRNAME}/../../ralph_loop.sh" | |
| # Create state file with some rate-limited accounts | |
| _ensure_state_file | |
| mark_limited_now "key:0" | |
| mark_limited_now "key:1" | |
| run bash "$ralph_script" --reset-accounts | |
| assert_success | |
| [[ "$output" == *"reset successfully"* ]] | |
| local limited_count active_index | |
| limited_count=$(jq '.rate_limited | length' "$ACCOUNT_ROTATION_STATE_FILE") | |
| active_index=$(jq -r '.active_index' "$ACCOUNT_ROTATION_STATE_FILE") | |
| [[ "$limited_count" == "0" ]] | |
| [[ "$active_index" == "0" ]] | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/test_account_rotation.bats` around lines 701 - 712, Update the
test so it verifies the state file was actually cleared, not just the output:
after calling run bash "$ralph_script" --reset-accounts (in
tests/unit/test_account_rotation.bats) read the state file created by
_ensure_state_file and assert it no longer contains the entries added by
mark_limited_now ("key:0" and "key:1") or is empty; keep the existing
run/assert_success/assert on output but add an assertion that inspects the state
file contents to confirm the reset mutated state (use the same state file
variable/helper the test suite uses to locate/read the state created by
_ensure_state_file).
…d-in detection - Fix set -e killing script when execute_claude_code returns non-zero: use `execute_claude_code "$loop_count" || exec_result=$?` pattern - Fix monitor call counter display: use increment_call_counter() to persist immediately instead of deferring to successful execution only - Add 'hit your limit' to Layer 3 text fallback pattern (frankbria#198) - Add Layer 4 detection: 'not logged in' triggers rotation when ACCOUNT_ROTATION=true - Add daily cap fixture, 'hit your limit' test, and two 'not logged in' tests - Update CLAUDE.md test counts (630 total) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
tests/unit/test_cli_modern.bats (1)
1110-1129: Add a Layer 4 false-positive regression test.Current new tests validate true/false rotation config behavior, but not false positives from echoed
type:user/tool_resulttext containing/login. Add one negative test so Layer 4 doesn’t over-trigger rotations.Based on learnings: "Test suite must achieve 100% test pass rate with comprehensive coverage of ... edge cases ...".Suggested test addition
+@test "behavioral: echoed '/login' text in tool_result does not trigger rotation" { + local output_file="$TEST_DIR/claude_output_login_echo.log" + cat > "$output_file" << 'EOF' +{"type":"user","message":{"role":"user","content":[{"type":"tool_result","tool_use_id":"toolu_abc","content":"Docs: If auth expires, please run /login manually."}]}} +{"type":"result","subtype":"success","is_error":true,"result":"Build failed for unrelated reason"} +EOF + + ACCOUNT_ROTATION=true run _detect_api_limit 1 "$output_file" + [[ "$status" -eq 1 ]] +}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/test_cli_modern.bats` around lines 1110 - 1129, Add a negative regression test in tests/unit/test_cli_modern.bats that ensures _detect_api_limit does NOT treat echoed "/login" text in unrelated fields as a "not logged in" trigger: create a sample output file containing lines like 'type:user' or 'tool_result' that include "/login" but do not indicate an actual login prompt, then run ACCOUNT_ROTATION=true run _detect_api_limit 1 "$output_file" and assert that "$status" equals 1 (no rotation). Use the existing test structure and helpers (e.g., similar to create_sample_not_logged_in_result) and reference the _detect_api_limit invocation so the test guards against Layer 4 false positives.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ralph_loop.sh`:
- Around line 1746-1749: The log and status update for exec_result == 2
currently hardcodes "Claude API 5-hour limit reached" which is inaccurate for
other paths (daily-cap, not-logged-in); change the handling in the block that
calls update_status and log_status (the code referencing update_status
"$loop_count" "$(cat "$CALL_COUNT_FILE")" "api_limit" "paused" and log_status
"WARN") to use a generic message such as "API limit or auth issue" (or similar)
and avoid mentioning "5-hour limit" so the update_status and log_status calls
reflect a generic exec_result=2 condition across update_status, log_status and
any surrounding variables like loop_count and CALL_COUNT_FILE.
- Around line 1519-1523: The current Layer 4 check greps the whole output_file
and can falsely trigger rotation on echoed "/login" text; change it to search
only the recent, filtered output (e.g., tail -30 of "$output_file" piped through
the same field/filtering used in Layer 3) before grepping for "not logged in" or
"please run /login"; update the conditional that references ACCOUNT_ROTATION and
the grep to use that filtered recent output and keep the same return 2 and
log_status "🔐 Claude account not logged in — triggering rotation" behavior so
rotation logic stays unchanged.
- Around line 1929-1936: The current --reset-accounts branch calls
reset_account_rotation and always prints success/exit 0; modify it to capture
the exit status of reset_account_rotation (e.g., via a local variable or
checking "$?") and only print the green success message and exit 0 when the
function returns success; on failure print a red error message including the
function's exit code or stderr and exit with a non-zero status. Ensure you
reference the reset_account_rotation invocation in ralph_loop.sh and propagate
any errors from lib/account_rotation.sh appropriately.
---
Nitpick comments:
In `@tests/unit/test_cli_modern.bats`:
- Around line 1110-1129: Add a negative regression test in
tests/unit/test_cli_modern.bats that ensures _detect_api_limit does NOT treat
echoed "/login" text in unrelated fields as a "not logged in" trigger: create a
sample output file containing lines like 'type:user' or 'tool_result' that
include "/login" but do not indicate an actual login prompt, then run
ACCOUNT_ROTATION=true run _detect_api_limit 1 "$output_file" and assert that
"$status" equals 1 (no rotation). Use the existing test structure and helpers
(e.g., similar to create_sample_not_logged_in_result) and reference the
_detect_api_limit invocation so the test guards against Layer 4 false positives.
| # Layer 4: Account not logged in — treat as rotatable error when rotation is configured | ||
| if [[ "${ACCOUNT_ROTATION:-false}" == "true" ]] && grep -qi "not logged in\|please run /login" "$output_file" 2>/dev/null; then | ||
| log_status "ERROR" "🔐 Claude account not logged in — triggering rotation" | ||
| return 2 # Use same code as API limit so rotation logic fires | ||
| fi |
There was a problem hiding this comment.
Scope Layer 4 login detection to filtered recent output to avoid false rotations.
Line 1520 greps the entire output file. This can rotate accounts on echoed /login text from user/tool payloads, not actual auth failures.
Proposed fix
- if [[ "${ACCOUNT_ROTATION:-false}" == "true" ]] && grep -qi "not logged in\|please run /login" "$output_file" 2>/dev/null; then
+ if [[ "${ACCOUNT_ROTATION:-false}" == "true" ]] && \
+ tail -30 "$output_file" 2>/dev/null | \
+ grep -vE '"type"\s*:\s*"user"' | \
+ grep -v '"tool_result"' | \
+ grep -v '"tool_use_id"' | \
+ grep -qiE 'not logged in|please run /login'; then
log_status "ERROR" "🔐 Claude account not logged in — triggering rotation"
return 2 # Use same code as API limit so rotation logic fires
fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Layer 4: Account not logged in — treat as rotatable error when rotation is configured | |
| if [[ "${ACCOUNT_ROTATION:-false}" == "true" ]] && grep -qi "not logged in\|please run /login" "$output_file" 2>/dev/null; then | |
| log_status "ERROR" "🔐 Claude account not logged in — triggering rotation" | |
| return 2 # Use same code as API limit so rotation logic fires | |
| fi | |
| # Layer 4: Account not logged in — treat as rotatable error when rotation is configured | |
| if [[ "${ACCOUNT_ROTATION:-false}" == "true" ]] && \ | |
| tail -30 "$output_file" 2>/dev/null | \ | |
| grep -vE '"type"\s*:\s*"user"' | \ | |
| grep -v '"tool_result"' | \ | |
| grep -v '"tool_use_id"' | \ | |
| grep -qiE 'not logged in|please run /login'; then | |
| log_status "ERROR" "🔐 Claude account not logged in — triggering rotation" | |
| return 2 # Use same code as API limit so rotation logic fires | |
| fi |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ralph_loop.sh` around lines 1519 - 1523, The current Layer 4 check greps the
whole output_file and can falsely trigger rotation on echoed "/login" text;
change it to search only the recent, filtered output (e.g., tail -30 of
"$output_file" piped through the same field/filtering used in Layer 3) before
grepping for "not logged in" or "please run /login"; update the conditional that
references ACCOUNT_ROTATION and the grep to use that filtered recent output and
keep the same return 2 and log_status "🔐 Claude account not logged in —
triggering rotation" behavior so rotation logic stays unchanged.
| # API 5-hour limit reached — try account rotation first (Issue #81) | ||
| update_status "$loop_count" "$(cat "$CALL_COUNT_FILE")" "api_limit" "paused" | ||
| log_status "WARN" "🛑 Claude API 5-hour limit reached!" | ||
|
|
||
| # Ask user whether to wait or exit | ||
| echo -e "\n${YELLOW}The Claude API 5-hour usage limit has been reached.${NC}" | ||
| echo -e "${YELLOW}You can either:${NC}" | ||
| echo -e " ${GREEN}1)${NC} Wait for the limit to reset (usually within an hour)" | ||
| echo -e " ${GREEN}2)${NC} Exit the loop and try again later" | ||
| echo -e "\n${BLUE}Choose an option (1 or 2):${NC} " | ||
|
|
||
| # Read user input with timeout | ||
| read -t 30 -n 1 user_choice || true | ||
| echo # New line after input | ||
|
|
||
| if [[ "$user_choice" == "2" ]]; then | ||
| log_status "INFO" "User chose to exit. Exiting loop..." | ||
| update_status "$loop_count" "$(cat "$CALL_COUNT_FILE")" "api_limit_exit" "stopped" "api_5hour_limit" | ||
| break | ||
|
|
There was a problem hiding this comment.
Use a generic message for exec_result=2 instead of hardcoded “5-hour limit”.
Line 1748 is now inaccurate for daily-cap and not-logged-in paths that also return code 2, which can mislead troubleshooting.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ralph_loop.sh` around lines 1746 - 1749, The log and status update for
exec_result == 2 currently hardcodes "Claude API 5-hour limit reached" which is
inaccurate for other paths (daily-cap, not-logged-in); change the handling in
the block that calls update_status and log_status (the code referencing
update_status "$loop_count" "$(cat "$CALL_COUNT_FILE")" "api_limit" "paused" and
log_status "WARN") to use a generic message such as "API limit or auth issue"
(or similar) and avoid mentioning "5-hour limit" so the update_status and
log_status calls reflect a generic exec_result=2 condition across update_status,
log_status and any surrounding variables like loop_count and CALL_COUNT_FILE.
| --reset-accounts) | ||
| # Reset account rotation state (Issue #81) | ||
| SCRIPT_DIR="$(dirname "${BASH_SOURCE[0]}")" | ||
| source "$SCRIPT_DIR/lib/date_utils.sh" | ||
| source "$SCRIPT_DIR/lib/account_rotation.sh" | ||
| reset_account_rotation "Manual reset via command line" | ||
| echo -e "\033[0;32m✅ Account rotation state reset successfully\033[0m" | ||
| exit 0 |
There was a problem hiding this comment.
Check reset_account_rotation result before reporting success.
Line 1934 can fail (e.g., state-file write/jq issues), but the command still prints success and exits 0.
Proposed fix
--reset-accounts)
# Reset account rotation state (Issue `#81`)
SCRIPT_DIR="$(dirname "${BASH_SOURCE[0]}")"
source "$SCRIPT_DIR/lib/date_utils.sh"
source "$SCRIPT_DIR/lib/account_rotation.sh"
- reset_account_rotation "Manual reset via command line"
- echo -e "\033[0;32m✅ Account rotation state reset successfully\033[0m"
- exit 0
+ if reset_account_rotation "Manual reset via command line"; then
+ echo -e "\033[0;32m✅ Account rotation state reset successfully\033[0m"
+ exit 0
+ else
+ echo -e "\033[0;31m❌ Failed to reset account rotation state\033[0m"
+ exit 1
+ fi
;;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| --reset-accounts) | |
| # Reset account rotation state (Issue #81) | |
| SCRIPT_DIR="$(dirname "${BASH_SOURCE[0]}")" | |
| source "$SCRIPT_DIR/lib/date_utils.sh" | |
| source "$SCRIPT_DIR/lib/account_rotation.sh" | |
| reset_account_rotation "Manual reset via command line" | |
| echo -e "\033[0;32m✅ Account rotation state reset successfully\033[0m" | |
| exit 0 | |
| --reset-accounts) | |
| # Reset account rotation state (Issue `#81`) | |
| SCRIPT_DIR="$(dirname "${BASH_SOURCE[0]}")" | |
| source "$SCRIPT_DIR/lib/date_utils.sh" | |
| source "$SCRIPT_DIR/lib/account_rotation.sh" | |
| if reset_account_rotation "Manual reset via command line"; then | |
| echo -e "\033[0;32m✅ Account rotation state reset successfully\033[0m" | |
| exit 0 | |
| else | |
| echo -e "\033[0;31m❌ Failed to reset account rotation state\033[0m" | |
| exit 1 | |
| fi | |
| ;; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ralph_loop.sh` around lines 1929 - 1936, The current --reset-accounts branch
calls reset_account_rotation and always prints success/exit 0; modify it to
capture the exit status of reset_account_rotation (e.g., via a local variable or
checking "$?") and only print the green success message and exit 0 when the
function returns success; on failure print a red error message including the
function's exit code or stderr and exit with a non-zero status. Ensure you
reference the reset_account_rotation invocation in ralph_loop.sh and propagate
any errors from lib/account_rotation.sh appropriately.
…eset on rotate
- Remove ~/.claude special case in switch_account() — CLAUDE_CONFIG_DIR is
now always set explicitly for all rotation accounts. Using the default
~/.claude dir caused both accounts to share the same rate limit pool
when credentials resolved to the same org.
- Each rotation account must use a dedicated config dir (e.g.,
~/.claude-account1, ~/.claude-account2) with separate credentials.
- Add reset_session("account_rotation") on successful rotation to prevent
"No conversation found" errors from resuming account 1's session on
account 2.
- Update tests to reflect new always-set behavior (61/61 pass).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (3)
ralph_loop.sh (2)
1746-1749:⚠️ Potential issue | 🟡 MinorUse generic code-2 messaging/reason (not only “5-hour limit”).
Lines 1746-1749 and Line 1777 still hardcode 5-hour semantics, but code 2 now also covers daily-cap and auth-rotation paths. This can mislead troubleshooting and status consumers.
✏️ Suggested wording update
- # API 5-hour limit reached — try account rotation first (Issue `#81`) + # Rotatable API/auth condition detected — try account rotation first (Issue `#81`) update_status "$loop_count" "$(cat "$CALL_COUNT_FILE")" "api_limit" "paused" - log_status "WARN" "🛑 Claude API 5-hour limit reached!" + log_status "WARN" "🛑 Claude API limit/auth condition detected (code 2)" @@ - echo -e "\n${YELLOW}The Claude API 5-hour usage limit has been reached.${NC}" + echo -e "\n${YELLOW}Claude reported a limit/auth condition.${NC}" @@ - update_status "$loop_count" "$(cat "$CALL_COUNT_FILE")" "api_limit_exit" "stopped" "api_5hour_limit" + update_status "$loop_count" "$(cat "$CALL_COUNT_FILE")" "api_limit_exit" "stopped" "api_limit_or_auth"Also applies to: 1762-1778
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ralph_loop.sh` around lines 1746 - 1749, Update the hardcoded "5-hour" messaging and status reason to a generic code-2 / API limit message in the calls to update_status and log_status (e.g., in the block that currently calls update_status "$loop_count" "$(cat "$CALL_COUNT_FILE")" "api_limit" "paused" and log_status "WARN" "🛑 Claude API 5-hour limit reached!"); change the status reason to a generic token such as "code_2" or "api_limit_generic" and revise the log text to remove "5-hour" wording so it covers 5-hour, daily-cap and auth-rotation cases; apply the same change to the similar block around lines 1762-1778 to keep messaging consistent across update_status and log_status usages.
1519-1523:⚠️ Potential issue | 🟠 MajorScope Layer 4 login detection to filtered recent output.
Line 1520 scans the full output log and can rotate on echoed
/logintext from payload/tool content. Reuse the same filtered tail pattern as Layer 3 before matching auth strings.As per coding guidelines: "Implement three-layer API 5-hour limit detection in ralph_loop.sh: Layer 1 timeout guard (code 124 → code 1), Layer 2 structural JSON detection (primary), Layer 3 filtered text fallback (tail -30 with field filtering)".🎯 Proposed fix
- if [[ "${ACCOUNT_ROTATION:-false}" == "true" ]] && grep -qi "not logged in\|please run /login" "$output_file" 2>/dev/null; then + if [[ "${ACCOUNT_ROTATION:-false}" == "true" ]] && \ + tail -30 "$output_file" 2>/dev/null | \ + grep -vE '"type"\s*:\s*"user"' | \ + grep -v '"tool_result"' | \ + grep -v '"tool_use_id"' | \ + grep -qiE 'not logged in|please run /login'; then log_status "ERROR" "🔐 Claude account not logged in — triggering rotation" return 2 # Use same code as API limit so rotation logic fires fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ralph_loop.sh` around lines 1519 - 1523, The Layer 4 login-detection block in ralph_loop.sh currently greps the full "$output_file" and may match echoed "/login" from payloads; change it to reuse the same filtered recent-output pattern used by Layer 3 (i.e., tail the most recent lines and apply the same field filtering) before running the case-insensitive grep for "not logged in" or "please run /login"; specifically update the if condition that checks ACCOUNT_ROTATION and the grep to operate on the Layer 3 filtered/tail output (same pipeline used by the Layer 3 fallback) so rotation only triggers on recent, filtered assistant output.tests/unit/test_account_rotation.bats (1)
709-712:⚠️ Potential issue | 🟠 MajorVerify
--reset-accountsmutates state, not just output text.This test can pass if the command prints success but leaves stale cooldown entries in
.account_rotation_state. Assert post-reset JSON fields directly.Based on learnings, "Test suite must achieve 100% test pass rate with comprehensive coverage of: CLI parsing, JSON parsing, session management, exit detection, rate limiting, loop execution, edge cases, installation, project setup, PRD import, enable core, task sources, Ralph enable integration, wizard utilities, and file protection".✅ Add state assertions
`@test` "--reset-accounts: resets account rotation state" { @@ run bash "$ralph_script" --reset-accounts assert_success [[ "$output" == *"reset successfully"* ]] + + local limited_count active_index active_type + limited_count=$(jq '.rate_limited | length' "$ACCOUNT_ROTATION_STATE_FILE") + active_index=$(jq -r '.active_index' "$ACCOUNT_ROTATION_STATE_FILE") + active_type=$(jq -r '.active_type' "$ACCOUNT_ROTATION_STATE_FILE") + [[ "$limited_count" == "0" ]] + [[ "$active_index" == "0" ]] + [[ "$active_type" == "none" ]] }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/test_account_rotation.bats` around lines 709 - 712, The test for --reset-accounts only checks output text and must also verify the on-disk state: after running run bash "$ralph_script" --reset-accounts and assert_success, open and parse the .account_rotation_state JSON and assert that cooldown-related fields (e.g., any keys like "cooldowns", "last_rotated", or whatever the implementation stores) are cleared or set to their zero/initial values; update the test to read the file path used by the CLI ('.account_rotation_state') and add explicit JSON assertions to ensure stale cooldown entries are removed rather than relying solely on the "reset successfully" message.
🧹 Nitpick comments (1)
lib/account_rotation.sh (1)
50-50: Document intentional dynamic source for ShellCheck.Line 50 currently triggers SC1090. Add
# shellcheck source=/dev/nullimmediately beforesource "$ACCOUNT_KEYS_FILE"to keep lint output clean.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@lib/account_rotation.sh` at line 50, Add a ShellCheck directive immediately above the dynamic source to suppress SC1090: insert a comment `# shellcheck source=/dev/null` directly before the `source "$ACCOUNT_KEYS_FILE"` line in lib/account_rotation.sh so the linter knows this dynamic sourcing is intentional and won’t raise SC1090; keep the rest of the `source` invocation unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lib/account_rotation.sh`:
- Around line 45-50: Before sourcing ACCOUNT_KEYS_FILE in the account rotation
logic, add a strict permission and ownership check: verify ACCOUNT_KEYS_FILE
exists and is a regular file, that its owner is the current user (UID/username),
and that group/other have no read or write (and no execute) bits set (i.e. perms
like 600 or 400); if any check fails, log an error and exit non-zero (fail
closed) instead of sourcing. Apply this check immediately before the existing
source "$ACCOUNT_KEYS_FILE" line and ensure it protects the CLAUDE_ACCOUNT_KEYS
loading path.
---
Duplicate comments:
In `@ralph_loop.sh`:
- Around line 1746-1749: Update the hardcoded "5-hour" messaging and status
reason to a generic code-2 / API limit message in the calls to update_status and
log_status (e.g., in the block that currently calls update_status "$loop_count"
"$(cat "$CALL_COUNT_FILE")" "api_limit" "paused" and log_status "WARN" "🛑
Claude API 5-hour limit reached!"); change the status reason to a generic token
such as "code_2" or "api_limit_generic" and revise the log text to remove
"5-hour" wording so it covers 5-hour, daily-cap and auth-rotation cases; apply
the same change to the similar block around lines 1762-1778 to keep messaging
consistent across update_status and log_status usages.
- Around line 1519-1523: The Layer 4 login-detection block in ralph_loop.sh
currently greps the full "$output_file" and may match echoed "/login" from
payloads; change it to reuse the same filtered recent-output pattern used by
Layer 3 (i.e., tail the most recent lines and apply the same field filtering)
before running the case-insensitive grep for "not logged in" or "please run
/login"; specifically update the if condition that checks ACCOUNT_ROTATION and
the grep to operate on the Layer 3 filtered/tail output (same pipeline used by
the Layer 3 fallback) so rotation only triggers on recent, filtered assistant
output.
In `@tests/unit/test_account_rotation.bats`:
- Around line 709-712: The test for --reset-accounts only checks output text and
must also verify the on-disk state: after running run bash "$ralph_script"
--reset-accounts and assert_success, open and parse the .account_rotation_state
JSON and assert that cooldown-related fields (e.g., any keys like "cooldowns",
"last_rotated", or whatever the implementation stores) are cleared or set to
their zero/initial values; update the test to read the file path used by the CLI
('.account_rotation_state') and add explicit JSON assertions to ensure stale
cooldown entries are removed rather than relying solely on the "reset
successfully" message.
---
Nitpick comments:
In `@lib/account_rotation.sh`:
- Line 50: Add a ShellCheck directive immediately above the dynamic source to
suppress SC1090: insert a comment `# shellcheck source=/dev/null` directly
before the `source "$ACCOUNT_KEYS_FILE"` line in lib/account_rotation.sh so the
linter knows this dynamic sourcing is intentional and won’t raise SC1090; keep
the rest of the `source` invocation unchanged.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
CLAUDE.mdlib/account_rotation.shralph_loop.shtests/unit/test_account_rotation.bats
| if [[ -f "$ACCOUNT_KEYS_FILE" ]]; then | ||
| # SECURITY: This sources arbitrary bash from the user's home directory. | ||
| # The file is user-created and user-owned (~/.ralph/accounts.conf), | ||
| # so we trust it the same way we trust .bashrc or .profile. | ||
| local CLAUDE_ACCOUNT_KEYS=() | ||
| source "$ACCOUNT_KEYS_FILE" |
There was a problem hiding this comment.
Harden accounts.conf before sourcing secrets.
Line 50 executes ACCOUNT_KEYS_FILE as shell code and loads API keys. Add a strict permission check first (and fail closed) so group/world-readable or writable files are rejected.
🔒 Proposed hardening
if [[ -f "$ACCOUNT_KEYS_FILE" ]]; then
+ # Enforce restrictive permissions for secrets file
+ local perms=""
+ if perms=$(stat -c '%a' "$ACCOUNT_KEYS_FILE" 2>/dev/null); then
+ :
+ elif perms=$(stat -f '%Lp' "$ACCOUNT_KEYS_FILE" 2>/dev/null); then
+ :
+ fi
+ if [[ -n "$perms" && "$perms" -gt 600 ]]; then
+ echo "ERROR: $ACCOUNT_KEYS_FILE must have permissions 600 or stricter" >&2
+ return 1
+ fi
+
# SECURITY: This sources arbitrary bash from the user's home directory.
# The file is user-created and user-owned (~/.ralph/accounts.conf),
# so we trust it the same way we trust .bashrc or .profile.
local CLAUDE_ACCOUNT_KEYS=()
+ # shellcheck source=/dev/null
source "$ACCOUNT_KEYS_FILE"🧰 Tools
🪛 Shellcheck (0.11.0)
[warning] 50-50: ShellCheck can't follow non-constant source. Use a directive to specify location.
(SC1090)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@lib/account_rotation.sh` around lines 45 - 50, Before sourcing
ACCOUNT_KEYS_FILE in the account rotation logic, add a strict permission and
ownership check: verify ACCOUNT_KEYS_FILE exists and is a regular file, that its
owner is the current user (UID/username), and that group/other have no read or
write (and no execute) bits set (i.e. perms like 600 or 400); if any check
fails, log an error and exit non-zero (fail closed) instead of sourcing. Apply
this check immediately before the existing source "$ACCOUNT_KEYS_FILE" line and
ensure it protects the CLAUDE_ACCOUNT_KEYS loading path.
Summary
lib/account_rotation.shwith full state management: sequential rotation, per-account cooldowns, config dir and API key switchingset -ecrash:execute_claude_codereturning non-zero (API limit, circuit breaker) was silently killing the script before rotation logic could runACCOUNT_ROTATION=trueCLAUDE_CONFIG_DIRexplicitly — the default~/.claudedir must never be used for rotation accounts, as its keychain credentials can share the same org rate limit pool as other accountsConfiguration
Important: Each rotation account must use a dedicated config dir (e.g.,
~/.claude-account1,~/.claude-account2). Do NOT use~/.claude— its keychain credentials are tied to the default/unset state and may share the same org rate limit pool as other accounts, making rotation ineffective.Acceptance Criteria
ACCOUNT_ROTATION=true— zero behavior change when not configuredCLAUDE_CONFIG_DIRalways set explicitly for all rotation accounts (no~/.claudefallback)ACCOUNT_ROTATION=true(Layer 4)--reset-accountsCLI flag to clear all rate-limit cooldownsset -ecrash on non-zeroexecute_claude_codereturnTest Plan
tests/unit/test_account_rotation.bats(config loading, rotation order, cooldowns, env var switching, CLI flag)test_cli_modern.bats: daily cap 'hit your limit', 'not logged in' with rotation on/offRelated Issues
Closes #81
Fixes #198 (daily usage cap not detected)
Breaking Changes
None. Rotation is fully opt-in. Existing behavior unchanged when
ACCOUNT_ROTATIONis not set orfalse.🤖 Generated with Claude Code