fix(loop): Foundry credential isolation, model override, and clean Ctrl+C shutdown#186
Conversation
Add Foundry credential isolation, model override, Chrome flag, tmux env forwarding, circuit-breaker recovery wait, and reliable Ctrl+C shutdown to
|
ralph_loop.sh
Outdated
| if [[ -n "$CLAUDE_CHILD_PID" ]]; then | ||
| # Kill the entire process group rooted at CLAUDE_CHILD_PID | ||
| # This catches: timeout -> stdbuf -> claude (and pipeline peers in live mode) | ||
| kill -- -"$CLAUDE_CHILD_PID" 2>/dev/null || true |
There was a problem hiding this comment.
🟠 High
ralph_loop.sh:1579 kill -- -"$CLAUDE_CHILD_PID" assumes the PID equals a PGID, but backgrounded subshells in non-interactive scripts inherit the parent's process group. This silently fails (ESRCH) or kills unrelated processes. Consider using pkill -P to kill by parent PID, or setsid when spawning to create a new session/process group.
🚀 Want me to fix this? Reply ex: "fix it for me".
🤖 Prompt for AI
In file ralph_loop.sh around line 1579:
`kill -- -"$CLAUDE_CHILD_PID"` assumes the PID equals a PGID, but backgrounded subshells in non-interactive scripts inherit the parent's process group. This silently fails (ESRCH) or kills unrelated processes. Consider using `pkill -P` to kill by parent PID, or `setsid` when spawning to create a new session/process group.
|
|
||
| [[ -z "$patterns" ]] && return 0 | ||
|
|
||
| for pattern in $patterns; do |
There was a problem hiding this comment.
🟡 Medium
ralph_loop.sh:203 Unquoted $patterns causes glob expansion—if files match ANTHROPIC_*, they replace the pattern. Consider wrapping the loop with set -f / set +f to disable globbing.
🚀 Want me to fix this? Reply ex: "fix it for me".
🤖 Prompt for AI
In file ralph_loop.sh around line 203:
Unquoted `$patterns` causes glob expansion—if files match `ANTHROPIC_*`, they replace the pattern. Consider wrapping the loop with `set -f` / `set +f` to disable globbing.
| [[ -n "$_env_CB_COOLDOWN_MINUTES" ]] && CB_COOLDOWN_MINUTES="$_env_CB_COOLDOWN_MINUTES" | ||
| [[ -n "$_env_CB_AUTO_RESET" ]] && CB_AUTO_RESET="$_env_CB_AUTO_RESET" | ||
| [[ -n "$_env_RALPH_TMUX_ENV" ]] && RALPH_TMUX_ENV="$_env_RALPH_TMUX_ENV" | ||
| [[ -n "$_env_CLAUDE_MODEL" ]] && CLAUDE_MODEL="$_env_CLAUDE_MODEL" |
There was a problem hiding this comment.
🟡 Medium
ralph_loop.sh:177 Missing restoration of _env_CLAUDE_CODE_USE_FOUNDRY and _env_ANTHROPIC_* variables. These are captured earlier but not restored here, so .ralphrc values can override user environment settings, violating the documented precedence. Consider adding restoration lines for these Foundry-related variables.
- [[ -n "$_env_CLAUDE_MODEL" ]] && CLAUDE_MODEL="$_env_CLAUDE_MODEL"
+ [[ -n "$_env_CLAUDE_MODEL" ]] && CLAUDE_MODEL="$_env_CLAUDE_MODEL"
+ [[ -n "$_env_CLAUDE_CODE_USE_FOUNDRY" ]] && CLAUDE_CODE_USE_FOUNDRY="$_env_CLAUDE_CODE_USE_FOUNDRY"
+ [[ -n "$_env_ANTHROPIC_FOUNDRY_RESOURCE" ]] && ANTHROPIC_FOUNDRY_RESOURCE="$_env_ANTHROPIC_FOUNDRY_RESOURCE"
+ [[ -n "$_env_ANTHROPIC_FOUNDRY_API_KEY" ]] && ANTHROPIC_FOUNDRY_API_KEY="$_env_ANTHROPIC_FOUNDRY_API_KEY"🚀 Want me to fix this? Reply ex: "fix it for me".
🤖 Prompt for AI
In file ralph_loop.sh around line 177:
Missing restoration of `_env_CLAUDE_CODE_USE_FOUNDRY` and `_env_ANTHROPIC_*` variables. These are captured earlier but not restored here, so `.ralphrc` values can override user environment settings, violating the documented precedence. Consider adding restoration lines for these Foundry-related variables.
e85290f to
3a20578
Compare
WalkthroughAdds tmux environment forwarding, a circuit-breaker cooldown polling flow, Foundry/Anthropic model/auth wiring, subshell-based live Claude execution with CLAUDE_CHILD_PID tracking and improved cleanup, plus corresponding unit-test updates to assert background/subshell and wait-based exit handling. Changes
Sequence Diagram(s)sequenceDiagram
participant Main as ralph_loop
participant CB as CircuitBreaker
participant Recovery as wait_for_circuit_recovery
participant ClaudeCLI as Claude CLI
Main->>CB: query circuit state
CB-->>Main: OPEN
Main->>Recovery: start polling loop
Recovery->>CB: poll state (repeat)
CB-->>Recovery: OPEN (repeat)
CB-->>Recovery: HALF_OPEN/CLOSED
Recovery-->>Main: recovered
Main->>ClaudeCLI: resume/execute request (may include --model/--chrome)
ClaudeCLI-->>Main: stream output / exit
sequenceDiagram
participant Main as ralph_loop
participant Builder as build_claude_command()
participant Subshell as backgrounded subshell () &
participant Claude as Claude CLI
participant Reaper as cleanup / wait
Main->>Builder: build command (includes CLAUDE_MODEL/CLAUDE_CHROME if set)
Main->>Subshell: launch subshell with command
Subshell->>Claude: exec Claude CLI (child processes)
Subshell-->>Main: export CLAUDE_CHILD_PID (background)
Main->>Reaper: wait $CLAUDE_CHILD_PID
Reaper-->>Main: exit_code=$?
Main->>Reaper: cleanup process tree using CLAUDE_CHILD_PID
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
ralph_loop.sh (1)
1800-1804:⚠️ Potential issue | 🟠 MajorIssue
#185(read -t 30+set -e) appears unfixed here.Line 1801
read -t 30 -n 1 user_choicereturns non-zero on timeout, which underset -eexits the script — exactly the bug described in Issue#185. This path is reached when the API 5-hour limit is hit (exec_result -eq 2). While this line isn't changed in this PR, the PR references Issue#185as the motivating problem.Suggested fix
- read -t 30 -n 1 user_choice + read -t 30 -n 1 user_choice || true🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ralph_loop.sh` around lines 1800 - 1804, The read with timeout (`read -t 30 -n 1 user_choice`) can return non-zero on timeout and cause the script to exit under set -e; change the call so its non-zero exit doesn't abort the script (for example, wrap it to ignore failures or capture its status), e.g. perform the timed read in a conditional and set user_choice="" on timeout or use `|| true` after the read so subsequent logic that checks user_choice (the existing if [[ "$user_choice" == "2" ]] || [[ -z "$user_choice" ]]) continues to run; update the read invocation near where user_choice is used to ensure timeouts don't trigger script termination.
🧹 Nitpick comments (4)
ralph_loop.sh (3)
1404-1406: LGTM —CLAUDE_CHILD_PIDtracking in background mode enables Ctrl+C cleanup.Note: the
wait $claude_pidon line 1451 has the sameset -eexposure as the live-mode path flagged above. Consider applying the samewait $claude_pid || exit_code=$?fix here as well for consistency, though this pattern may have been pre-existing.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ralph_loop.sh` around lines 1404 - 1406, The background-mode path captures the child PID into claude_pid and CLAUDE_CHILD_PID but still performs a bare wait "wait $claude_pid" which will trigger set -e on failure; change the wait logic used for the live-mode path to a safe form here as well (e.g., replace the direct wait with "wait $claude_pid || exit_code=$?" style handling and propagate/handle exit_code consistently), updating any downstream cleanup or exit logic that inspects exit_code so background-mode exit handling mirrors live-mode; reference variables/functions: claude_pid, CLAUDE_CHILD_PID, wait $claude_pid, and exit_code.
1707-1709: Behavioral change: circuit breaker OPEN now blocks indefinitely instead of exiting.Previously the loop would
breakon circuit breaker OPEN. Now it callswait_for_circuit_recovery(unbounded) andcontinue. This is intentional per the PR but worth confirming this is the desired UX — operators may prefer a configurable max-wait or an exit after N failed recoveries.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ralph_loop.sh` around lines 1707 - 1709, The change makes the circuit-breaker OPEN path call wait_for_circuit_recovery("$loop_count") and continue, which can block indefinitely; change this to honor a configurable max wait or max recovery attempts by adding a check around the log_status/wait_for_circuit_recovery block (referencing log_status, wait_for_circuit_recovery, and loop_count) so that if a new config variable (e.g., MAX_RECOVERY_ATTEMPTS or MAX_RECOVERY_WAIT) is exceeded you log a final error and break/exit instead of continuing; implement incrementing/tracking of recovery attempts (or elapsed wait) and use that to decide between calling wait_for_circuit_recovery and breaking out of the loop.
1465-1474: Non-portable\|alternation ingrep— usegrep -Efor reliability.Line 1468 uses
\|for alternation in basic regex, which is a GNU extension not supported by BSD grep (macOS). Since this is informational logging, the impact is low, but it's trivially fixable.Proposed fix
- if grep -q '"usage"\|"billing"\|"credits_remaining"' "$output_file" 2>/dev/null; then + if grep -qE '"usage"|"billing"|"credits_remaining"' "$output_file" 2>/dev/null; then🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ralph_loop.sh` around lines 1465 - 1474, The grep alternation using '\|' is non-portable; update the check that reads "$output_file" (the grep command that currently tests for '"usage"\|"billing"\|"credits_remaining"') to use extended regex mode (e.g., grep -E or grep -P if available) with a single quoted alternation like '("usage"|"billing"|"credits_remaining")', keep the -q and 2>/dev/null redirect, and leave the surrounding conditional logic unchanged (the block referencing output_file, CLAUDE_CODE_USE_FOUNDRY, and log_status should remain the same).tests/unit/test_cli_modern.bats (1)
838-869: Add unit tests forforward_tmux_env()andwait_for_circuit_recovery()functions.Both functions are in
ralph_loop.shbut have no corresponding tests in this file.forward_tmux_env()handles glob pattern matching, prefix expansion, and conditional Foundry credential unsetting.wait_for_circuit_recovery()implements a polling loop with state checking and timestamp calculations. Both warrant unit test coverage per coding guidelines.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/test_cli_modern.bats` around lines 838 - 869, Add unit tests covering forward_tmux_env() and wait_for_circuit_recovery(): for forward_tmux_env() write cases that verify glob pattern matching and prefix expansion produce expected environment variable exports and that when a variable name matches the Foundry credential pattern it unsets FOUNDRY_USERNAME/FOUNDRY_PASSWORD appropriately (including negative cases); for wait_for_circuit_recovery() write tests that stub the dependencies/state checker to simulate state transitions, verify the polling loop calls sleep as expected, that timestamp/calculation logic yields correct backoff/timeout behavior, and that success and timeout exit conditions set/return the correct codes—place these tests alongside the existing live-mode tests and use the same sed/grep pattern extraction approach to invoke and assert on the ralph_loop.sh functions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ralph_loop.sh`:
- Around line 1207-1209: The current code logs the first 10 characters of
ANTHROPIC_FOUNDRY_API_KEY via the log_status call, exposing sensitive secret
material; change the behavior in the block that checks ANTHROPIC_FOUNDRY_API_KEY
so it does not emit partial keys to $LOG_DIR/ralph.log — instead log only
presence/absence (e.g., "ANTHROPIC_FOUNDRY_API_KEY: set" or "not set") or, if
you must show a hint, mask to a minimal safe slice such as a single character or
last 4 chars; update the log_status invocation that references
ANTHROPIC_FOUNDRY_API_KEY accordingly to avoid writing key material to logs.
- Around line 50-61: The comment for the Foundry flag is misleading: it says to
set CLAUDE_CODE_USE_FOUNDRY=true but the script defaults
CLAUDE_CODE_USE_FOUNDRY="${CLAUDE_CODE_USE_FOUNDRY:-0}" and later checks for ==
"1"; update the comment to instruct users to set CLAUDE_CODE_USE_FOUNDRY=1 (or
alternatively modify the runtime check to accept "true" as well) so the behavior
of CLAUDE_CODE_USE_FOUNDRY, its default, and the check (referenced by
CLAUDE_CODE_USE_FOUNDRY and the runtime comparison around the Foundry check) are
consistent.
- Around line 509-543: The wait_for_circuit_recovery function lacks an upper
bound and documentation; update it to accept or use a MAX_WAIT_MINUTES (or
MAX_ITERATIONS) constant and enforce a maximum loop duration/iteration count
inside wait_for_circuit_recovery so that if get_circuit_state remains
"$CB_STATE_OPEN" beyond the limit it logs an error, updates status (e.g.,
"circuit_breaker_failed" or "timeout"), and returns a non-zero value; also add a
bash function comment block above wait_for_circuit_recovery describing purpose,
parameters (loop_count, optional max wait), and return values (0 on recovered,
non-zero on timeout/error), and reference functions/vars like
init_circuit_breaker, get_circuit_state, CB_COOLDOWN_MINUTES, CB_STATE_FILE,
parse_iso_to_epoch, and CALL_COUNT_FILE when implementing the timeout handling
and status updates.
- Around line 200-242: The function forward_tmux_env disables globbing with set
-f but can exit early (e.g., if tmux set-environment fails) leaving globbing
disabled; ensure set +f always runs by scoping noglob in a subshell or
installing a short-lived trap: either wrap the pattern-processing loop in a
subshell so set -f is local to that subshell, or add a trap (e.g., trap 'set +f'
RETURN or EXIT) immediately before set -f and clear it after set +f; apply this
to the forward_tmux_env function and ensure every tmux set-environment
invocation (the tmux set-environment calls) either is protected with || true or
is covered by the trap/subshell so set +f is restored on all exit paths.
- Around line 1319-1332: The wait on CLAUDE_CHILD_PID can trigger set -e when
the child exits non‑zero (e.g., timeout 124), so change the wait invocation to
capture the child's exit status without causing an immediate script exit: run
wait $CLAUDE_CHILD_PID with a fallthrough that assigns exit_code (e.g., use
"wait $CLAUDE_CHILD_PID || exit_code=$?") so the script records the actual exit
code for downstream timeout/session/circuit-breaker handling (references:
CLAUDE_CHILD_PID, wait, exit_code, portable_timeout).
---
Outside diff comments:
In `@ralph_loop.sh`:
- Around line 1800-1804: The read with timeout (`read -t 30 -n 1 user_choice`)
can return non-zero on timeout and cause the script to exit under set -e; change
the call so its non-zero exit doesn't abort the script (for example, wrap it to
ignore failures or capture its status), e.g. perform the timed read in a
conditional and set user_choice="" on timeout or use `|| true` after the read so
subsequent logic that checks user_choice (the existing if [[ "$user_choice" ==
"2" ]] || [[ -z "$user_choice" ]]) continues to run; update the read invocation
near where user_choice is used to ensure timeouts don't trigger script
termination.
---
Nitpick comments:
In `@ralph_loop.sh`:
- Around line 1404-1406: The background-mode path captures the child PID into
claude_pid and CLAUDE_CHILD_PID but still performs a bare wait "wait
$claude_pid" which will trigger set -e on failure; change the wait logic used
for the live-mode path to a safe form here as well (e.g., replace the direct
wait with "wait $claude_pid || exit_code=$?" style handling and propagate/handle
exit_code consistently), updating any downstream cleanup or exit logic that
inspects exit_code so background-mode exit handling mirrors live-mode; reference
variables/functions: claude_pid, CLAUDE_CHILD_PID, wait $claude_pid, and
exit_code.
- Around line 1707-1709: The change makes the circuit-breaker OPEN path call
wait_for_circuit_recovery("$loop_count") and continue, which can block
indefinitely; change this to honor a configurable max wait or max recovery
attempts by adding a check around the log_status/wait_for_circuit_recovery block
(referencing log_status, wait_for_circuit_recovery, and loop_count) so that if a
new config variable (e.g., MAX_RECOVERY_ATTEMPTS or MAX_RECOVERY_WAIT) is
exceeded you log a final error and break/exit instead of continuing; implement
incrementing/tracking of recovery attempts (or elapsed wait) and use that to
decide between calling wait_for_circuit_recovery and breaking out of the loop.
- Around line 1465-1474: The grep alternation using '\|' is non-portable; update
the check that reads "$output_file" (the grep command that currently tests for
'"usage"\|"billing"\|"credits_remaining"') to use extended regex mode (e.g.,
grep -E or grep -P if available) with a single quoted alternation like
'("usage"|"billing"|"credits_remaining")', keep the -q and 2>/dev/null redirect,
and leave the surrounding conditional logic unchanged (the block referencing
output_file, CLAUDE_CODE_USE_FOUNDRY, and log_status should remain the same).
In `@tests/unit/test_cli_modern.bats`:
- Around line 838-869: Add unit tests covering forward_tmux_env() and
wait_for_circuit_recovery(): for forward_tmux_env() write cases that verify glob
pattern matching and prefix expansion produce expected environment variable
exports and that when a variable name matches the Foundry credential pattern it
unsets FOUNDRY_USERNAME/FOUNDRY_PASSWORD appropriately (including negative
cases); for wait_for_circuit_recovery() write tests that stub the
dependencies/state checker to simulate state transitions, verify the polling
loop calls sleep as expected, that timestamp/calculation logic yields correct
backoff/timeout behavior, and that success and timeout exit conditions
set/return the correct codes—place these tests alongside the existing live-mode
tests and use the same sed/grep pattern extraction approach to invoke and assert
on the ralph_loop.sh functions.
| # Azure Foundry / Anthropic API configuration | ||
| # Set CLAUDE_CODE_USE_FOUNDRY=true in .ralphrc to use Azure Foundry for this project | ||
| _env_CLAUDE_CODE_USE_FOUNDRY="${CLAUDE_CODE_USE_FOUNDRY:-}" | ||
| _env_ANTHROPIC_FOUNDRY_RESOURCE="${ANTHROPIC_FOUNDRY_RESOURCE:-}" | ||
| _env_ANTHROPIC_FOUNDRY_API_KEY="${ANTHROPIC_FOUNDRY_API_KEY:-}" | ||
| _env_ANTHROPIC_DEFAULT_SONNET_MODEL="${ANTHROPIC_DEFAULT_SONNET_MODEL:-}" | ||
| _env_ANTHROPIC_DEFAULT_HAIKU_MODEL="${ANTHROPIC_DEFAULT_HAIKU_MODEL:-}" | ||
| _env_ANTHROPIC_DEFAULT_OPUS_MODEL="${ANTHROPIC_DEFAULT_OPUS_MODEL:-}" | ||
| _env_CLAUDE_MODEL="${CLAUDE_MODEL:-}" | ||
|
|
||
| # Default: don't use Foundry (use personal account) | ||
| CLAUDE_CODE_USE_FOUNDRY="${CLAUDE_CODE_USE_FOUNDRY:-0}" |
There was a problem hiding this comment.
Inconsistent documentation: comment says true but code expects 1.
Line 51 says "Set CLAUDE_CODE_USE_FOUNDRY=true in .ralphrc" but line 61 defaults to "0" and the runtime check on line 1201 compares == "1". A user following the comment would set true, which would not match "1".
Proposed fix
# Azure Foundry / Anthropic API configuration
-# Set CLAUDE_CODE_USE_FOUNDRY=true in .ralphrc to use Azure Foundry for this project
+# Set CLAUDE_CODE_USE_FOUNDRY=1 in .ralphrc to use Azure Foundry for this project🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ralph_loop.sh` around lines 50 - 61, The comment for the Foundry flag is
misleading: it says to set CLAUDE_CODE_USE_FOUNDRY=true but the script defaults
CLAUDE_CODE_USE_FOUNDRY="${CLAUDE_CODE_USE_FOUNDRY:-0}" and later checks for ==
"1"; update the comment to instruct users to set CLAUDE_CODE_USE_FOUNDRY=1 (or
alternatively modify the runtime check to accept "true" as well) so the behavior
of CLAUDE_CODE_USE_FOUNDRY, its default, and the check (referenced by
CLAUDE_CODE_USE_FOUNDRY and the runtime comparison around the Foundry check) are
consistent.
| # Forward environment variables into a tmux session | ||
| # Uses RALPH_TMUX_ENV to determine which vars to forward | ||
| # Supports exact names (e.g., "FOO") and prefix patterns (e.g., "ANTHROPIC_*") | ||
| forward_tmux_env() { | ||
| local session_name="$1" | ||
| local patterns="$RALPH_TMUX_ENV" | ||
|
|
||
| [[ -z "$patterns" ]] && return 0 | ||
|
|
||
| # Disable globbing so patterns like "ANTHROPIC_*" are not expanded | ||
| # against filenames in the current directory | ||
| set -f | ||
| for pattern in $patterns; do | ||
| if [[ "$pattern" == *'*' ]]; then | ||
| # Prefix pattern: match all env vars starting with the prefix | ||
| local prefix="${pattern%\*}" | ||
| while IFS= read -r line; do | ||
| local name="${line%%=*}" | ||
| local value="${line#*=}" | ||
| if [[ -n "$name" && -n "$value" ]]; then | ||
| tmux set-environment -t "$session_name" "$name" "$value" | ||
| fi | ||
| done < <(env | grep "^${prefix}" || true) | ||
| else | ||
| # Exact match: forward if set in current environment | ||
| local value="${!pattern:-}" | ||
| if [[ -n "$value" ]]; then | ||
| tmux set-environment -t "$session_name" "$pattern" "$value" | ||
| fi | ||
| fi | ||
| done | ||
| set +f | ||
|
|
||
| # When Foundry is disabled, actively unset Foundry vars at session level | ||
| # to prevent leakage from tmux global environment (concurrent session isolation) | ||
| if [[ "${CLAUDE_CODE_USE_FOUNDRY:-0}" != "1" ]]; then | ||
| tmux set-environment -t "$session_name" -u ANTHROPIC_FOUNDRY_RESOURCE 2>/dev/null || true | ||
| tmux set-environment -t "$session_name" -u ANTHROPIC_FOUNDRY_API_KEY 2>/dev/null || true | ||
| tmux set-environment -t "$session_name" -u ANTHROPIC_DEFAULT_SONNET_MODEL 2>/dev/null || true | ||
| tmux set-environment -t "$session_name" -u ANTHROPIC_DEFAULT_HAIKU_MODEL 2>/dev/null || true | ||
| tmux set-environment -t "$session_name" -u ANTHROPIC_DEFAULT_OPUS_MODEL 2>/dev/null || true | ||
| fi | ||
| } |
There was a problem hiding this comment.
set -f is not restored if an early exit occurs between Lines 211 and 231.
Under set -e, if tmux set-environment (lines 220, 227) fails, the script exits with globbing still disabled. Consider using a trap or restructuring to ensure set +f runs on all exit paths.
Proposed fix — use a subshell to scope noglob
- set -f
- for pattern in $patterns; do
+ # Run in subshell so set -f (noglob) is automatically scoped
+ (
+ set -f
+ for pattern in $patterns; do
if [[ "$pattern" == *'*' ]]; then
...
fi
done
- set +f
+ )Alternatively, add || true to the tmux set-environment calls on lines 220 and 227 to match the defensive style already used on lines 236–240.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Forward environment variables into a tmux session | |
| # Uses RALPH_TMUX_ENV to determine which vars to forward | |
| # Supports exact names (e.g., "FOO") and prefix patterns (e.g., "ANTHROPIC_*") | |
| forward_tmux_env() { | |
| local session_name="$1" | |
| local patterns="$RALPH_TMUX_ENV" | |
| [[ -z "$patterns" ]] && return 0 | |
| # Disable globbing so patterns like "ANTHROPIC_*" are not expanded | |
| # against filenames in the current directory | |
| set -f | |
| for pattern in $patterns; do | |
| if [[ "$pattern" == *'*' ]]; then | |
| # Prefix pattern: match all env vars starting with the prefix | |
| local prefix="${pattern%\*}" | |
| while IFS= read -r line; do | |
| local name="${line%%=*}" | |
| local value="${line#*=}" | |
| if [[ -n "$name" && -n "$value" ]]; then | |
| tmux set-environment -t "$session_name" "$name" "$value" | |
| fi | |
| done < <(env | grep "^${prefix}" || true) | |
| else | |
| # Exact match: forward if set in current environment | |
| local value="${!pattern:-}" | |
| if [[ -n "$value" ]]; then | |
| tmux set-environment -t "$session_name" "$pattern" "$value" | |
| fi | |
| fi | |
| done | |
| set +f | |
| # When Foundry is disabled, actively unset Foundry vars at session level | |
| # to prevent leakage from tmux global environment (concurrent session isolation) | |
| if [[ "${CLAUDE_CODE_USE_FOUNDRY:-0}" != "1" ]]; then | |
| tmux set-environment -t "$session_name" -u ANTHROPIC_FOUNDRY_RESOURCE 2>/dev/null || true | |
| tmux set-environment -t "$session_name" -u ANTHROPIC_FOUNDRY_API_KEY 2>/dev/null || true | |
| tmux set-environment -t "$session_name" -u ANTHROPIC_DEFAULT_SONNET_MODEL 2>/dev/null || true | |
| tmux set-environment -t "$session_name" -u ANTHROPIC_DEFAULT_HAIKU_MODEL 2>/dev/null || true | |
| tmux set-environment -t "$session_name" -u ANTHROPIC_DEFAULT_OPUS_MODEL 2>/dev/null || true | |
| fi | |
| } | |
| # Forward environment variables into a tmux session | |
| # Uses RALPH_TMUX_ENV to determine which vars to forward | |
| # Supports exact names (e.g., "FOO") and prefix patterns (e.g., "ANTHROPIC_*") | |
| forward_tmux_env() { | |
| local session_name="$1" | |
| local patterns="$RALPH_TMUX_ENV" | |
| [[ -z "$patterns" ]] && return 0 | |
| # Run in subshell so set -f (noglob) is automatically scoped | |
| ( | |
| set -f | |
| for pattern in $patterns; do | |
| if [[ "$pattern" == *'*' ]]; then | |
| # Prefix pattern: match all env vars starting with the prefix | |
| local prefix="${pattern%\*}" | |
| while IFS= read -r line; do | |
| local name="${line%%=*}" | |
| local value="${line#*=}" | |
| if [[ -n "$name" && -n "$value" ]]; then | |
| tmux set-environment -t "$session_name" "$name" "$value" | |
| fi | |
| done < <(env | grep "^${prefix}" || true) | |
| else | |
| # Exact match: forward if set in current environment | |
| local value="${!pattern:-}" | |
| if [[ -n "$value" ]]; then | |
| tmux set-environment -t "$session_name" "$pattern" "$value" | |
| fi | |
| fi | |
| done | |
| ) | |
| # When Foundry is disabled, actively unset Foundry vars at session level | |
| # to prevent leakage from tmux global environment (concurrent session isolation) | |
| if [[ "${CLAUDE_CODE_USE_FOUNDRY:-0}" != "1" ]]; then | |
| tmux set-environment -t "$session_name" -u ANTHROPIC_FOUNDRY_RESOURCE 2>/dev/null || true | |
| tmux set-environment -t "$session_name" -u ANTHROPIC_FOUNDRY_API_KEY 2>/dev/null || true | |
| tmux set-environment -t "$session_name" -u ANTHROPIC_DEFAULT_SONNET_MODEL 2>/dev/null || true | |
| tmux set-environment -t "$session_name" -u ANTHROPIC_DEFAULT_HAIKU_MODEL 2>/dev/null || true | |
| tmux set-environment -t "$session_name" -u ANTHROPIC_DEFAULT_OPUS_MODEL 2>/dev/null || true | |
| fi | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ralph_loop.sh` around lines 200 - 242, The function forward_tmux_env disables
globbing with set -f but can exit early (e.g., if tmux set-environment fails)
leaving globbing disabled; ensure set +f always runs by scoping noglob in a
subshell or installing a short-lived trap: either wrap the pattern-processing
loop in a subshell so set -f is local to that subshell, or add a trap (e.g.,
trap 'set +f' RETURN or EXIT) immediately before set -f and clear it after set
+f; apply this to the forward_tmux_env function and ensure every tmux
set-environment invocation (the tmux set-environment calls) either is protected
with || true or is covered by the trap/subshell so set +f is restored on all
exit paths.
| # Wait for circuit breaker cooldown recovery instead of exiting | ||
| wait_for_circuit_recovery() { | ||
| local loop_count=$1 | ||
| local check_interval=60 # Check every 60 seconds | ||
|
|
||
| log_status "WARN" "Circuit breaker OPEN - waiting for cooldown recovery (${CB_COOLDOWN_MINUTES}m)..." | ||
| update_status "$loop_count" "$(cat "$CALL_COUNT_FILE" 2>/dev/null || echo "0")" "waiting_for_recovery" "paused" "circuit_breaker_cooldown" | ||
|
|
||
| while true; do | ||
| sleep "$check_interval" | ||
|
|
||
| # Re-init circuit breaker (handles OPEN→HALF_OPEN/CLOSED transitions) | ||
| init_circuit_breaker | ||
|
|
||
| local state | ||
| state=$(get_circuit_state) | ||
| if [[ "$state" != "$CB_STATE_OPEN" ]]; then | ||
| log_status "SUCCESS" "Circuit breaker recovered to $state - resuming loop" | ||
| return 0 | ||
| fi | ||
|
|
||
| # Show remaining cooldown time | ||
| local opened_at | ||
| opened_at=$(jq -r '.opened_at // .last_change // ""' "$CB_STATE_FILE" 2>/dev/null) | ||
| if [[ -n "$opened_at" && "$opened_at" != "null" ]]; then | ||
| local opened_epoch elapsed_minutes remaining | ||
| opened_epoch=$(parse_iso_to_epoch "$opened_at") | ||
| elapsed_minutes=$(( ($(date +%s) - opened_epoch) / 60 )) | ||
| remaining=$((CB_COOLDOWN_MINUTES - elapsed_minutes)) | ||
| if [[ $remaining -gt 0 ]]; then | ||
| log_status "INFO" "Cooldown: ${elapsed_minutes}m elapsed, ~${remaining}m remaining" | ||
| fi | ||
| fi | ||
| done | ||
| } |
There was a problem hiding this comment.
wait_for_circuit_recovery has no upper bound — a corrupted state file will loop forever.
If get_circuit_state always returns OPEN (e.g., corrupted state file, clock skew preventing cooldown expiry), this function never returns. Consider adding a maximum wait time or iteration count as a safety net.
Proposed safeguard
wait_for_circuit_recovery() {
local loop_count=$1
local check_interval=60 # Check every 60 seconds
+ local max_wait_seconds=$(( (CB_COOLDOWN_MINUTES + 10) * 60 )) # cooldown + 10min buffer
+ local waited=0
log_status "WARN" "Circuit breaker OPEN - waiting for cooldown recovery (${CB_COOLDOWN_MINUTES}m)..."
...
while true; do
sleep "$check_interval"
+ waited=$((waited + check_interval))
+
+ if [[ $waited -ge $max_wait_seconds ]]; then
+ log_status "ERROR" "Recovery wait exceeded max time (${max_wait_seconds}s) - forcing circuit reset"
+ reset_circuit_breaker "recovery_timeout"
+ return 0
+ fi
# Re-init circuit breaker ...As per coding guidelines, **/*.sh: "Use bash function comments documenting purpose, parameters, and return values for all functions" — this function is also missing parameter/return documentation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ralph_loop.sh` around lines 509 - 543, The wait_for_circuit_recovery function
lacks an upper bound and documentation; update it to accept or use a
MAX_WAIT_MINUTES (or MAX_ITERATIONS) constant and enforce a maximum loop
duration/iteration count inside wait_for_circuit_recovery so that if
get_circuit_state remains "$CB_STATE_OPEN" beyond the limit it logs an error,
updates status (e.g., "circuit_breaker_failed" or "timeout"), and returns a
non-zero value; also add a bash function comment block above
wait_for_circuit_recovery describing purpose, parameters (loop_count, optional
max wait), and return values (0 on recovered, non-zero on timeout/error), and
reference functions/vars like init_circuit_breaker, get_circuit_state,
CB_COOLDOWN_MINUTES, CB_STATE_FILE, parse_iso_to_epoch, and CALL_COUNT_FILE when
implementing the timeout handling and status updates.
| if [[ -n "${ANTHROPIC_FOUNDRY_API_KEY:-}" ]]; then | ||
| export ANTHROPIC_FOUNDRY_API_KEY | ||
| log_status "INFO" " API Key: ${ANTHROPIC_FOUNDRY_API_KEY:0:10}..." |
There was a problem hiding this comment.
Logging partial API key is a security risk.
Line 1209 logs the first 10 characters of ANTHROPIC_FOUNDRY_API_KEY. Even a partial key in log files (which persist on disk at $LOG_DIR/ralph.log) increases exposure surface. Consider masking to just a few characters or logging only presence/absence.
Proposed fix
if [[ -n "${ANTHROPIC_FOUNDRY_API_KEY:-}" ]]; then
export ANTHROPIC_FOUNDRY_API_KEY
- log_status "INFO" " API Key: ${ANTHROPIC_FOUNDRY_API_KEY:0:10}..."
+ log_status "INFO" " API Key: ****${ANTHROPIC_FOUNDRY_API_KEY: -4}..."
fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if [[ -n "${ANTHROPIC_FOUNDRY_API_KEY:-}" ]]; then | |
| export ANTHROPIC_FOUNDRY_API_KEY | |
| log_status "INFO" " API Key: ${ANTHROPIC_FOUNDRY_API_KEY:0:10}..." | |
| if [[ -n "${ANTHROPIC_FOUNDRY_API_KEY:-}" ]]; then | |
| export ANTHROPIC_FOUNDRY_API_KEY | |
| log_status "INFO" " API Key: ****${ANTHROPIC_FOUNDRY_API_KEY: -4}..." | |
| fi |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ralph_loop.sh` around lines 1207 - 1209, The current code logs the first 10
characters of ANTHROPIC_FOUNDRY_API_KEY via the log_status call, exposing
sensitive secret material; change the behavior in the block that checks
ANTHROPIC_FOUNDRY_API_KEY so it does not emit partial keys to $LOG_DIR/ralph.log
— instead log only presence/absence (e.g., "ANTHROPIC_FOUNDRY_API_KEY: set" or
"not set") or, if you must show a hint, mask to a minimal safe slice such as a
single character or last 4 chars; update the log_status invocation that
references ANTHROPIC_FOUNDRY_API_KEY accordingly to avoid writing key material
to logs.
| # | ||
| # Run in a subshell so cleanup() can kill the entire pipeline | ||
| # (timeout + stdbuf + claude + tee + jq) with a single process group | ||
| # kill on Ctrl+C. The subshell also avoids the errexit issue (Issue #175) | ||
| # where timeout's exit code 124 would silently kill the script under set -e. | ||
| ( | ||
| set -o pipefail | ||
| portable_timeout ${timeout_seconds}s stdbuf -oL "${LIVE_CMD_ARGS[@]}" \ | ||
| < /dev/null 2>&1 | stdbuf -oL tee "$output_file" | stdbuf -oL jq --unbuffered -j "$jq_filter" 2>/dev/null | tee "$LIVE_LOG_FILE" | ||
| ) & | ||
| CLAUDE_CHILD_PID=$! | ||
| wait $CLAUDE_CHILD_PID | ||
| exit_code=$? | ||
| CLAUDE_CHILD_PID="" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Test 1: Verify bash set -e behavior with wait returning non-zero
echo "=== Test 1: set -e + wait returning 124 ==="
bash -c '
set -e
( exit 124 ) &
pid=$!
echo "STARTING"
wait $pid
echo "REACHED exit_code=$?"
' 2>&1 || echo "(script exited with code $?)"
echo ""
echo "=== Test 2: set -e + wait || exit_code=$? (proposed fix) ==="
bash -c '
set -e
exit_code=0
( exit 124 ) &
pid=$!
echo "STARTING"
wait $pid || exit_code=$?
echo "REACHED exit_code=$exit_code"
' 2>&1
echo ""
echo "=== Test 3: Testing inside a function with set -e ==="
bash -c '
set -e
test_func() {
( exit 124 ) &
pid=$!
echo "IN FUNC STARTING"
wait $pid
echo "IN FUNC REACHED"
return 0
}
test_func
echo "CALLER REACHED"
' 2>&1 || echo "(function/script exited with code $?)"Repository: frankbria/ralph-claude-code
Length of output: 356
🏁 Script executed:
# Now read the actual code context
head -c 50000 ralph_loop.sh | tail -c 5000 | cat -nRepository: frankbria/ralph-claude-code
Length of output: 5859
🏁 Script executed:
# Get broader context around lines 1319-1332
sed -n '1300,1360p' ralph_loop.sh | cat -nRepository: frankbria/ralph-claude-code
Length of output: 3826
🏁 Script executed:
# Check how execute_claude_code is called and what the function returns
sed -n '1770,1800p' ralph_loop.sh | cat -nRepository: frankbria/ralph-claude-code
Length of output: 1834
🏁 Script executed:
# Check the start of execute_claude_code to see set -e status
sed -n '1200,1250p' ralph_loop.sh | cat -nRepository: frankbria/ralph-claude-code
Length of output: 2713
🏁 Script executed:
# Check line 6 to confirm set -e is enabled
sed -n '1,20p' ralph_loop.sh | cat -nRepository: frankbria/ralph-claude-code
Length of output: 984
wait $CLAUDE_CHILD_PID is subject to set -e — timeout (exit 124) terminates the script before timeout handling.
Under set -e (line 6), when wait $CLAUDE_CHILD_PID returns non-zero (timeout exit code 124), the script exits immediately at line 1331, never reaching exit_code=$? on line 1331. This bypasses all downstream timeout handling (line 1336), session extraction (line 1337+), and circuit breaker logic, defeating the fix for Issue #175.
Proposed fix — capture non-zero wait status without triggering errexit
) &
CLAUDE_CHILD_PID=$!
- wait $CLAUDE_CHILD_PID
- exit_code=$?
+ wait $CLAUDE_CHILD_PID || exit_code=$?
CLAUDE_CHILD_PID=""With || exit_code=$?, the || operator ensures the compound command succeeds (preventing set -e exit), while capturing the actual exit code for proper handling downstream.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ralph_loop.sh` around lines 1319 - 1332, The wait on CLAUDE_CHILD_PID can
trigger set -e when the child exits non‑zero (e.g., timeout 124), so change the
wait invocation to capture the child's exit status without causing an immediate
script exit: run wait $CLAUDE_CHILD_PID with a fallthrough that assigns
exit_code (e.g., use "wait $CLAUDE_CHILD_PID || exit_code=$?") so the script
records the actual exit code for downstream timeout/session/circuit-breaker
handling (references: CLAUDE_CHILD_PID, wait, exit_code, portable_timeout).
3a20578 to
c60b5a9
Compare
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (2)
ralph_loop.sh (1)
200-203:forward_tmux_envis missing a bash function documentation block.Per coding guidelines, all
*.shfunctions must have a comment block documenting purpose, parameters, and return values. As per coding guidelines, "Use bash function comments documenting purpose, parameters, and return values for all functions".📄 Suggested doc comment
+# forward_tmux_env - Forward configured environment variables into a tmux session. +# +# Reads RALPH_TMUX_ENV for a space-separated list of env var names or prefix +# patterns (trailing '*'). Forwards matching variables via `tmux set-environment`. +# When Foundry is disabled, actively unsets Foundry vars to prevent leakage +# across concurrent sessions. +# +# Parameters: +# $1 - session_name: tmux session target name +# +# Returns: 0 always (individual tmux failures are suppressed with || true) forward_tmux_env() {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ralph_loop.sh` around lines 200 - 203, Add a bash function documentation block above the forward_tmux_env function that states the function's purpose (forward specified environment variables into a tmux session), parameters (reads RALPH_TMUX_ENV containing exact names and prefix patterns like "ANTHROPIC_*"), and return behavior/exit codes (e.g., returns 0 on success, non‑zero on error). Ensure the comment follows the project's bash doc style (purpose, parameters, returns) and briefly notes behavior for exact names vs. wildcard prefixes and any side effects (modifies tmux-environment using tmux set-environment).tests/unit/test_cli_modern.bats (1)
838-869: Missing unit tests forforward_tmux_env,wait_for_circuit_recovery, andCLAUDE_MODELpassthrough.This PR introduces two new public functions (
forward_tmux_env,wait_for_circuit_recovery) and adds--modelinjection tobuild_claude_command. None of these have corresponding unit tests. Additionally, the inlinebuild_claude_commandcopy in thesetup()block (lines ~451–499) omits the new--modelbranch, so the existingbuild_claude_commandtests do not exercise the model-override path at all.As per coding guidelines, "Add unit tests for all new bash script functions using bats test framework."
For
forward_tmux_env, tmux can be mocked; forwait_for_circuit_recovery,sleepandget_circuit_statecan be stubbed. At minimum, add a test such as:`@test` "build_claude_command includes --model when CLAUDE_MODEL is set" { export CLAUDE_CODE_CMD="claude" export CLAUDE_OUTPUT_FORMAT="json" export CLAUDE_ALLOWED_TOOLS="" export CLAUDE_USE_CONTINUE="false" export CLAUDE_MODEL="claude-opus-4-6" echo "Test prompt" > "$PROMPT_FILE" # Call the real implementation (not the stale inline copy) source "${BATS_TEST_DIRNAME}/../../ralph_loop.sh" --help 2>/dev/null || true build_claude_command "$PROMPT_FILE" "" "" local cmd_string="${CLAUDE_CMD_ARGS[*]}" [[ "$cmd_string" == *"--model"* ]] [[ "$cmd_string" == *"claude-opus-4-6"* ]] } `@test` "build_claude_command omits --model when CLAUDE_MODEL is empty" { export CLAUDE_MODEL="" echo "Test prompt" > "$PROMPT_FILE" build_claude_command "$PROMPT_FILE" "" "" local cmd_string="${CLAUDE_CMD_ARGS[*]}" [[ "$cmd_string" != *"--model"* ]] }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/test_cli_modern.bats` around lines 838 - 869, Add bats unit tests covering the new public functions and the CLAUDE_MODEL passthrough: write tests for forward_tmux_env (mock tmux environment variables and assert they are forwarded), for wait_for_circuit_recovery (stub sleep and get_circuit_state to simulate transient states and assert correct retry/exit behavior), and for build_claude_command to assert it includes "--model <value>" when CLAUDE_MODEL is set and omits it when empty; also update the stale inline copy of build_claude_command in the setup() block so it either sources/uses the real build_claude_command implementation (rather than the outdated copy) or add the missing --model branch so tests exercise the model-override path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ralph_loop.sh`:
- Around line 1449-1451: In the background execution path where we call wait on
the Claude PID (wait $claude_pid), prevent set -e from aborting before we record
the child's exit status: temporarily disable error-exit or absorb wait's failure
so we can assign exit_code=$? after wait (e.g., run wait with an || true or wrap
in set +e / set -e), then use that exit_code to detect portable_timeout (124)
and run the same timeout handling, circuit breaker and session extraction logic;
update the block that currently contains wait $claude_pid and
CLAUDE_CHILD_PID="" to ensure wait's exit code is captured into exit_code
regardless of set -e.
- Around line 1463-1472: The grep pattern uses GNU-only BRE alternation (\|) and
fails on BSD/macOS; update the authentication detection in the block that checks
"$output_file" by replacing the grep call in that if-statement (the line that
currently uses grep -q '"usage"\|"billing"\|"credits_remaining"') with a
portable ERE invocation (e.g. grep -E or egrep) or equivalent multiple -e
patterns, preserve the quiet flag and redirection (2>/dev/null), so the
condition correctly matches "usage", "billing" or "credits_remaining" across
platforms; keep the surrounding logic referencing output_file,
CLAUDE_CODE_USE_FOUNDRY and log_status unchanged.
- Line 1459: The echo that writes to PROGRESS_FILE uses an unquoted command
substitution which can be split on the space in the timestamp; update the write
so the timestamp substitution is quoted (e.g. use "$(date '+%Y-%m-%d
%H:%M:%S')") or switch to printf to build the JSON safely before redirecting to
"$PROGRESS_FILE" — target the echo line that writes to PROGRESS_FILE in
ralph_loop.sh and ensure the substitution is quoted to prevent word splitting.
In `@tests/unit/test_cli_modern.bats`:
- Around line 838-854: In the test function test "live mode pipeline runs in
subshell to isolate errexit (Issue `#175`)" replace GNU-only '\s' usage in the
grep that checks for an indented opening parenthesis so it becomes
POSIX-portable: update the grep against the live_block (the line echo
"$live_block" | grep -q '^\s*(') to use [[:space:]] (e.g. '^[[:space:]]*(') so
macOS/BSD grep behaves correctly; leave the other checks (') &' and
'CLAUDE_CHILD_PID=\$!') unchanged.
- Around line 856-869: The test "live mode pipeline captures exit code via wait"
currently checks for separate lines matching wait \$CLAUDE_CHILD_PID and
exit_code=\$? which allows the buggy two-line form; change the assertions to
require the compound form using the OR operator so the wait and exit capture
appear together (e.g. assert that the live_block contains a pattern where wait
\$CLAUDE_CHILD_PID and \|\| and exit_code=\$? occur in the same command); keep
the existing check that CLAUDE_CHILD_PID is cleared (CLAUDE_CHILD_PID="") and
update the grep/regex used to match the live_block accordingly so only the safe
"wait ... || exit_code=$?" form passes.
---
Duplicate comments:
In `@ralph_loop.sh`:
- Around line 209-231: The loop that sets tmux environment vars disables
globbing with set -f but can exit early if tmux set-environment fails (under set
-e), leaving globbing disabled; update the block that calls tmux set-environment
(both the prefix branch and the exact-match branch referencing variables
patterns and session_name and using tmux set-environment) to either append ||
true to the tmux commands or ensure set +f always runs (e.g., via a
trap/finally), so failures in tmux set-environment do not prevent restoring
globbing.
- Around line 1205-1209: The script is leaking a partial
ANTHROPIC_FOUNDRY_API_KEY value to logs via the log_status call; update the
block that checks ANTHROPIC_FOUNDRY_API_KEY so it does not print the key
contents (in ralph_loop.sh where ANTHROPIC_FOUNDRY_API_KEY and log_status are
used). Replace the current log_status that prints
${ANTHROPIC_FOUNDRY_API_KEY:0:10} with a non-secret message (e.g., "API key set"
or a fully masked representation like "*******" or only show a fixed-length
masked suffix/prefix) so no actual key material is emitted to logs.
- Around line 507-541: Add a timeout guard and docstring to
wait_for_circuit_recovery: document the function and its parameters, then add an
optional max_wait_seconds parameter (or default like CB_COOLDOWN_MINUTES*60*2)
and track elapsed time inside the while loop; after each sleep/update/re-init
(calls to init_circuit_breaker and get_circuit_state) check elapsed against
max_wait_seconds and if exceeded log_status an ERROR, update_status to a timeout
state, and return non-zero; keep existing cooldown logging (opened_at parsing
via parse_iso_to_epoch and CB_STATE_FILE) but ensure the loop breaks on timeout
to avoid infinite hang.
- Around line 50-51: The comment and runtime check for the environment flag
CLAUDE_CODE_USE_FOUNDRY are inconsistent (comment says set to "true" but the
script checks for "1"); make them consistent by either updating the script's
runtime conditional to accept "true" (e.g., case-insensitive check for
"true"/"1"/"yes") or changing the comment to instruct setting
CLAUDE_CODE_USE_FOUNDRY="1"; update the CLAUDE_CODE_USE_FOUNDRY check and the
comment in ralph_loop.sh (and any related README/.ralphrc note) so both use the
same canonical value and document accepted truthy forms.
- Around line 1322-1330: The wait invocation is still vulnerable to set -e:
change the line using wait on CLAUDE_CHILD_PID so the script captures the
child's exit status even if wait returns non-zero (e.g., timeout). Replace the
bare "wait $CLAUDE_CHILD_PID" with a guarded form such as "wait
$CLAUDE_CHILD_PID || exit_code=$?" (or an equivalent if-block) to ensure
exit_code is assigned when wait fails, keeping the existing CLAUDE_CHILD_PID and
exit_code variables intact.
---
Nitpick comments:
In `@ralph_loop.sh`:
- Around line 200-203: Add a bash function documentation block above the
forward_tmux_env function that states the function's purpose (forward specified
environment variables into a tmux session), parameters (reads RALPH_TMUX_ENV
containing exact names and prefix patterns like "ANTHROPIC_*"), and return
behavior/exit codes (e.g., returns 0 on success, non‑zero on error). Ensure the
comment follows the project's bash doc style (purpose, parameters, returns) and
briefly notes behavior for exact names vs. wildcard prefixes and any side
effects (modifies tmux-environment using tmux set-environment).
In `@tests/unit/test_cli_modern.bats`:
- Around line 838-869: Add bats unit tests covering the new public functions and
the CLAUDE_MODEL passthrough: write tests for forward_tmux_env (mock tmux
environment variables and assert they are forwarded), for
wait_for_circuit_recovery (stub sleep and get_circuit_state to simulate
transient states and assert correct retry/exit behavior), and for
build_claude_command to assert it includes "--model <value>" when CLAUDE_MODEL
is set and omits it when empty; also update the stale inline copy of
build_claude_command in the setup() block so it either sources/uses the real
build_claude_command implementation (rather than the outdated copy) or add the
missing --model branch so tests exercise the model-override path.
| echo "$calls_made" > "$CALL_COUNT_FILE" | ||
|
|
||
| # Clear progress file | ||
| echo '{"status": "completed", "timestamp": "'$(date '+%Y-%m-%d %H:%M:%S')'"}' > "$PROGRESS_FILE" |
There was a problem hiding this comment.
SC2046: unquoted $(date) in echo — progress file write.
Shellcheck warns that the command substitution $(date '+%Y-%m-%d %H:%M:%S') is unquoted; word splitting on the space in the timestamp can produce two echo arguments. While echo concatenates them with a space and the JSON output is still valid today, it is brittle and flagged by the static analysis tool.
🔧 Proposed fix
- echo '{"status": "completed", "timestamp": "'$(date '+%Y-%m-%d %H:%M:%S')'"}' > "$PROGRESS_FILE"
+ echo '{"status": "completed", "timestamp": "'"$(date '+%Y-%m-%d %H:%M:%S')"'"}' > "$PROGRESS_FILE"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| echo '{"status": "completed", "timestamp": "'$(date '+%Y-%m-%d %H:%M:%S')'"}' > "$PROGRESS_FILE" | |
| echo '{"status": "completed", "timestamp": "'"$(date '+%Y-%m-%d %H:%M:%S')"'"}' > "$PROGRESS_FILE" |
🧰 Tools
🪛 Shellcheck (0.11.0)
[warning] 1459-1459: Quote this to prevent word splitting.
(SC2046)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ralph_loop.sh` at line 1459, The echo that writes to PROGRESS_FILE uses an
unquoted command substitution which can be split on the space in the timestamp;
update the write so the timestamp substitution is quoted (e.g. use "$(date
'+%Y-%m-%d %H:%M:%S')") or switch to printf to build the JSON safely before
redirecting to "$PROGRESS_FILE" — target the echo line that writes to
PROGRESS_FILE in ralph_loop.sh and ensure the substitution is quoted to prevent
word splitting.
| # Verify actual authentication method used by checking response metadata | ||
| if [[ -f "$output_file" ]]; then | ||
| # Check for usage/billing fields (indicates personal account) | ||
| if grep -q '"usage"\|"billing"\|"credits_remaining"' "$output_file" 2>/dev/null; then | ||
| log_status "INFO" "🔍 Detected: Personal Account (usage/billing info in response)" | ||
| elif [[ "${CLAUDE_CODE_USE_FOUNDRY:-0}" == "1" ]]; then | ||
| # Foundry responses typically don't have usage fields | ||
| log_status "INFO" "🔍 Detected: Azure Foundry (no usage info in response)" | ||
| fi | ||
| fi |
There was a problem hiding this comment.
grep -q '"usage"\|...' uses GNU BRE alternation — not portable to macOS/BSD grep.
\| as alternation is a GNU grep extension to BRE. BSD grep (macOS default) treats \| as a literal two-character sequence, so the pattern never matches on macOS. The authentication-detection logging would silently be skipped on non-Linux hosts.
🔧 Portable fix using ERE
- if grep -q '"usage"\|"billing"\|"credits_remaining"' "$output_file" 2>/dev/null; then
+ if grep -qE '"usage"|"billing"|"credits_remaining"' "$output_file" 2>/dev/null; then📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Verify actual authentication method used by checking response metadata | |
| if [[ -f "$output_file" ]]; then | |
| # Check for usage/billing fields (indicates personal account) | |
| if grep -q '"usage"\|"billing"\|"credits_remaining"' "$output_file" 2>/dev/null; then | |
| log_status "INFO" "🔍 Detected: Personal Account (usage/billing info in response)" | |
| elif [[ "${CLAUDE_CODE_USE_FOUNDRY:-0}" == "1" ]]; then | |
| # Foundry responses typically don't have usage fields | |
| log_status "INFO" "🔍 Detected: Azure Foundry (no usage info in response)" | |
| fi | |
| fi | |
| # Verify actual authentication method used by checking response metadata | |
| if [[ -f "$output_file" ]]; then | |
| # Check for usage/billing fields (indicates personal account) | |
| if grep -qE '"usage"|"billing"|"credits_remaining"' "$output_file" 2>/dev/null; then | |
| log_status "INFO" "🔍 Detected: Personal Account (usage/billing info in response)" | |
| elif [[ "${CLAUDE_CODE_USE_FOUNDRY:-0}" == "1" ]]; then | |
| # Foundry responses typically don't have usage fields | |
| log_status "INFO" "🔍 Detected: Azure Foundry (no usage info in response)" | |
| fi | |
| fi |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ralph_loop.sh` around lines 1463 - 1472, The grep pattern uses GNU-only BRE
alternation (\|) and fails on BSD/macOS; update the authentication detection in
the block that checks "$output_file" by replacing the grep call in that
if-statement (the line that currently uses grep -q
'"usage"\|"billing"\|"credits_remaining"') with a portable ERE invocation (e.g.
grep -E or egrep) or equivalent multiple -e patterns, preserve the quiet flag
and redirection (2>/dev/null), so the condition correctly matches "usage",
"billing" or "credits_remaining" across platforms; keep the surrounding logic
referencing output_file, CLAUDE_CODE_USE_FOUNDRY and log_status unchanged.
| @test "live mode pipeline runs in subshell to isolate errexit (Issue #175)" { | ||
| # The live-mode pipeline runs in a backgrounded subshell so that | ||
| # timeout's exit code 124 never triggers set -e in the parent. | ||
| # This also enables Ctrl+C cleanup via CLAUDE_CHILD_PID. | ||
| local script="${BATS_TEST_DIRNAME}/../../ralph_loop.sh" | ||
|
|
||
| # Extract only the live-mode section (from "Live output mode enabled" to "End of Output") | ||
| local live_block | ||
| live_block=$(sed -n '/Live output mode enabled/,/End of Output/p' "$script") | ||
|
|
||
| # set +e must appear before set -o pipefail within the live-mode block | ||
| echo "$live_block" | grep -q 'set +e' | ||
| echo "$live_block" | grep -q 'set -o pipefail' | ||
| # Pipeline must be inside a subshell (parentheses) that is backgrounded | ||
| echo "$live_block" | grep -q '^\s*(' | ||
| echo "$live_block" | grep -q ') &' | ||
|
|
||
| # Verify ordering: set +e comes first | ||
| local plus_e_line=$(echo "$live_block" | grep -n 'set +e' | head -1 | cut -d: -f1) | ||
| local pipefail_line=$(echo "$live_block" | grep -n 'set -o pipefail' | head -1 | cut -d: -f1) | ||
|
|
||
| [[ -n "$plus_e_line" ]] | ||
| [[ -n "$pipefail_line" ]] | ||
| [[ $plus_e_line -lt $pipefail_line ]] | ||
| # CLAUDE_CHILD_PID must be captured for cleanup | ||
| echo "$live_block" | grep -q 'CLAUDE_CHILD_PID=\$!' | ||
| } |
There was a problem hiding this comment.
grep -q '^\s*(' uses \s — not portable to BSD grep (macOS).
\s in BRE is a GNU grep extension; BSD grep treats it as a literal two-character sequence. The test could silently pass for the wrong reason on macOS CI runners. Use POSIX [[:space:]] instead.
🔧 Portable fix
- echo "$live_block" | grep -q '^\s*('
+ echo "$live_block" | grep -q '^[[:space:]]*('📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @test "live mode pipeline runs in subshell to isolate errexit (Issue #175)" { | |
| # The live-mode pipeline runs in a backgrounded subshell so that | |
| # timeout's exit code 124 never triggers set -e in the parent. | |
| # This also enables Ctrl+C cleanup via CLAUDE_CHILD_PID. | |
| local script="${BATS_TEST_DIRNAME}/../../ralph_loop.sh" | |
| # Extract only the live-mode section (from "Live output mode enabled" to "End of Output") | |
| local live_block | |
| live_block=$(sed -n '/Live output mode enabled/,/End of Output/p' "$script") | |
| # set +e must appear before set -o pipefail within the live-mode block | |
| echo "$live_block" | grep -q 'set +e' | |
| echo "$live_block" | grep -q 'set -o pipefail' | |
| # Pipeline must be inside a subshell (parentheses) that is backgrounded | |
| echo "$live_block" | grep -q '^\s*(' | |
| echo "$live_block" | grep -q ') &' | |
| # Verify ordering: set +e comes first | |
| local plus_e_line=$(echo "$live_block" | grep -n 'set +e' | head -1 | cut -d: -f1) | |
| local pipefail_line=$(echo "$live_block" | grep -n 'set -o pipefail' | head -1 | cut -d: -f1) | |
| [[ -n "$plus_e_line" ]] | |
| [[ -n "$pipefail_line" ]] | |
| [[ $plus_e_line -lt $pipefail_line ]] | |
| # CLAUDE_CHILD_PID must be captured for cleanup | |
| echo "$live_block" | grep -q 'CLAUDE_CHILD_PID=\$!' | |
| } | |
| `@test` "live mode pipeline runs in subshell to isolate errexit (Issue `#175`)" { | |
| # The live-mode pipeline runs in a backgrounded subshell so that | |
| # timeout's exit code 124 never triggers set -e in the parent. | |
| # This also enables Ctrl+C cleanup via CLAUDE_CHILD_PID. | |
| local script="${BATS_TEST_DIRNAME}/../../ralph_loop.sh" | |
| # Extract only the live-mode section (from "Live output mode enabled" to "End of Output") | |
| local live_block | |
| live_block=$(sed -n '/Live output mode enabled/,/End of Output/p' "$script") | |
| # Pipeline must be inside a subshell (parentheses) that is backgrounded | |
| echo "$live_block" | grep -q '^[[:space:]]*(' | |
| echo "$live_block" | grep -q ') &' | |
| # CLAUDE_CHILD_PID must be captured for cleanup | |
| echo "$live_block" | grep -q 'CLAUDE_CHILD_PID=\$!' | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/test_cli_modern.bats` around lines 838 - 854, In the test function
test "live mode pipeline runs in subshell to isolate errexit (Issue `#175`)"
replace GNU-only '\s' usage in the grep that checks for an indented opening
parenthesis so it becomes POSIX-portable: update the grep against the live_block
(the line echo "$live_block" | grep -q '^\s*(') to use [[:space:]] (e.g.
'^[[:space:]]*(') so macOS/BSD grep behaves correctly; leave the other checks
(') &' and 'CLAUDE_CHILD_PID=\$!') unchanged.
| @test "live mode pipeline captures exit code via wait" { | ||
| # After the backgrounded subshell, exit code is captured via wait | ||
| local script="${BATS_TEST_DIRNAME}/../../ralph_loop.sh" | ||
|
|
||
| # Extract only the live-mode section | ||
| local live_block | ||
| live_block=$(sed -n '/Live output mode enabled/,/End of Output/p' "$script") | ||
|
|
||
| # set +o pipefail and set -e must both exist in the live block | ||
| echo "$live_block" | grep -q 'set +o pipefail' | ||
| echo "$live_block" | grep -q 'set -e' | ||
|
|
||
| # Verify ordering: set -e comes after set +o pipefail | ||
| local pipefail_off_line=$(echo "$live_block" | grep -n 'set +o pipefail' | head -1 | cut -d: -f1) | ||
| local re_enable_line=$(echo "$live_block" | grep -n '^\s*set -e' | awk -F: -v threshold="$pipefail_off_line" '$1 > threshold {print $1; exit}') | ||
| # wait and exit_code capture must exist | ||
| echo "$live_block" | grep -q 'wait \$CLAUDE_CHILD_PID' | ||
| echo "$live_block" | grep -q 'exit_code=\$?' | ||
|
|
||
| [[ -n "$pipefail_off_line" ]] | ||
| [[ -n "$re_enable_line" ]] | ||
| [[ $re_enable_line -gt $pipefail_off_line ]] | ||
| # CLAUDE_CHILD_PID must be cleared after wait | ||
| echo "$live_block" | grep -q 'CLAUDE_CHILD_PID=""' | ||
| } |
There was a problem hiding this comment.
Test validates text presence but not the || operator — would pass with the unfixed, buggy form.
The test greps for wait \$CLAUDE_CHILD_PID and exit_code=\$? as two independent patterns. Both patterns match the current (unfixed) code that has:
wait $CLAUDE_CHILD_PID
exit_code=$?
on separate lines — the very form that triggers set -e on timeout (exit 124). The test should instead assert that wait and the exit-code capture appear on the same compound command using ||, which is the only form that is immune to set -e. As written, the test provides a false sense of safety and won't catch a regression that reintroduces the bug.
🔧 Tighten the assertion to require the `||` form
- # wait and exit_code capture must exist
- echo "$live_block" | grep -q 'wait \$CLAUDE_CHILD_PID'
- echo "$live_block" | grep -q 'exit_code=\$?'
+ # wait and exit_code capture must be a single compound command (|| prevents set -e
+ # from killing the script when Claude times out with exit code 124 — Issue `#175`)
+ echo "$live_block" | grep -q 'wait \$CLAUDE_CHILD_PID || exit_code=\$?'📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @test "live mode pipeline captures exit code via wait" { | |
| # After the backgrounded subshell, exit code is captured via wait | |
| local script="${BATS_TEST_DIRNAME}/../../ralph_loop.sh" | |
| # Extract only the live-mode section | |
| local live_block | |
| live_block=$(sed -n '/Live output mode enabled/,/End of Output/p' "$script") | |
| # set +o pipefail and set -e must both exist in the live block | |
| echo "$live_block" | grep -q 'set +o pipefail' | |
| echo "$live_block" | grep -q 'set -e' | |
| # Verify ordering: set -e comes after set +o pipefail | |
| local pipefail_off_line=$(echo "$live_block" | grep -n 'set +o pipefail' | head -1 | cut -d: -f1) | |
| local re_enable_line=$(echo "$live_block" | grep -n '^\s*set -e' | awk -F: -v threshold="$pipefail_off_line" '$1 > threshold {print $1; exit}') | |
| # wait and exit_code capture must exist | |
| echo "$live_block" | grep -q 'wait \$CLAUDE_CHILD_PID' | |
| echo "$live_block" | grep -q 'exit_code=\$?' | |
| [[ -n "$pipefail_off_line" ]] | |
| [[ -n "$re_enable_line" ]] | |
| [[ $re_enable_line -gt $pipefail_off_line ]] | |
| # CLAUDE_CHILD_PID must be cleared after wait | |
| echo "$live_block" | grep -q 'CLAUDE_CHILD_PID=""' | |
| } | |
| `@test` "live mode pipeline captures exit code via wait" { | |
| # After the backgrounded subshell, exit code is captured via wait | |
| local script="${BATS_TEST_DIRNAME}/../../ralph_loop.sh" | |
| local live_block | |
| live_block=$(sed -n '/Live output mode enabled/,/End of Output/p' "$script") | |
| # wait and exit_code capture must be a single compound command (|| prevents set -e | |
| # from killing the script when Claude times out with exit code 124 — Issue `#175`) | |
| echo "$live_block" | grep -q 'wait \$CLAUDE_CHILD_PID || exit_code=\$?' | |
| # CLAUDE_CHILD_PID must be cleared after wait | |
| echo "$live_block" | grep -q 'CLAUDE_CHILD_PID=""' | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/test_cli_modern.bats` around lines 856 - 869, The test "live mode
pipeline captures exit code via wait" currently checks for separate lines
matching wait \$CLAUDE_CHILD_PID and exit_code=\$? which allows the buggy
two-line form; change the assertions to require the compound form using the OR
operator so the wait and exit capture appear together (e.g. assert that the
live_block contains a pattern where wait \$CLAUDE_CHILD_PID and \|\| and
exit_code=\$? occur in the same command); keep the existing check that
CLAUDE_CHILD_PID is cleared (CLAUDE_CHILD_PID="") and update the grep/regex used
to match the live_block accordingly so only the safe "wait ... || exit_code=$?"
form passes.
…rl+C shutdown - Prevent Azure Foundry credentials from leaking across concurrent tmux sessions by unsetting ANTHROPIC_FOUNDRY_* vars at session level when CLAUDE_CODE_USE_FOUNDRY != 1, and unsetting them from process env in the personal account code path - Add CLAUDE_MODEL support in .ralphrc, passed as --model to Claude CLI - Fix Ctrl+C in --live mode: track child PIDs and kill the entire process tree in cleanup() to prevent orphaned Claude processes - Add forward_tmux_env() for configurable env var forwarding into tmux - Add wait_for_circuit_recovery() so OPEN circuit waits for cooldown instead of halting the loop permanently - Write call count at execution start for accurate monitor display Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
c60b5a9 to
a0cb953
Compare
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/unit/test_cli_modern.bats (1)
893-902:⚠️ Potential issue | 🟠 MajorAdd unit tests for new functions
forward_tmux_envandwait_for_circuit_recovery.These functions introduced in
ralph_loop.shlack unit tests across the entire test suite.forward_tmux_env(line 206) handles prefix-pattern matching, exact environment variable matching, and conditional Foundry variable unsetting.wait_for_circuit_recovery(line 515) implements cooldown monitoring with state-transition logic. Both are testable with bats. Per coding guidelines: "Add unit tests for all new bash script functions using bats test framework."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/test_cli_modern.bats` around lines 893 - 902, Add bats unit tests for the two new bash functions: for forward_tmux_env, write tests that cover prefix-pattern matching (variables like FOO_BAR matching FOO_), exact name matching, and the conditional unsetting behavior for Foundry-related vars (ensure variables are forwarded or removed as per the function logic in forward_tmux_env); for wait_for_circuit_recovery, add tests that simulate and assert cooldown monitoring and state-transition logic (e.g., transitions from open to closed with cooldowns, proper sleep/backoff behavior, and timeout handling) ensuring both normal and edge cases are covered; implement these as bats test cases in the existing test suite following its conventions so they run with the test runner.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@tests/unit/test_cli_modern.bats`:
- Around line 893-902: Add bats unit tests for the two new bash functions: for
forward_tmux_env, write tests that cover prefix-pattern matching (variables like
FOO_BAR matching FOO_), exact name matching, and the conditional unsetting
behavior for Foundry-related vars (ensure variables are forwarded or removed as
per the function logic in forward_tmux_env); for wait_for_circuit_recovery, add
tests that simulate and assert cooldown monitoring and state-transition logic
(e.g., transitions from open to closed with cooldowns, proper sleep/backoff
behavior, and timeout handling) ensuring both normal and edge cases are covered;
implement these as bats test cases in the existing test suite following its
conventions so they run with the test runner.
---
Duplicate comments:
In `@ralph_loop.sh`:
- Around line 514-548: The wait_for_circuit_recovery function can loop forever
if get_circuit_state keeps returning OPEN; add a safety upper bound by computing
a max_wait_seconds (e.g., (CB_COOLDOWN_MINUTES + 10) * 60) and track
total_sleeped or start_time to break the loop and return non-zero after
exceeding the max; keep calling init_circuit_breaker and get_circuit_state as
now, but when timing out log an ERROR and update_status accordingly before
returning 1; also add a full shell-style comment block above
wait_for_circuit_recovery documenting purpose, parameters (loop_count), and
return values (0 on recovered, non-zero on timeout/error); use existing symbols
like get_circuit_state, init_circuit_breaker, CB_COOLDOWN_MINUTES,
CB_STATE_FILE, parse_iso_to_epoch to locate and implement the changes.
- Around line 203-245: In forward_tmux_env, ensure shell globbing (set -f) is
always restored even if a command exits early: either wrap the for/while loop
that runs under set -f in a subshell (so set -f is local to that subshell) or
make the tmux set-environment invocations resilient by appending || true to each
tmux set-environment call (both in the prefix loop and the exact-match branch)
so an error won’t cause premature exit before the set +f; reference set -f, set
+f, tmux set-environment, and the forward_tmux_env function when making the
change.
- Around line 1464-1466: When waiting for the background Claude process, avoid
allowing set -e to abort the script on nonzero wait status: replace the
unguarded "wait $claude_pid" with a guarded form like "wait $claude_pid || true"
so the script can then reliably capture the exit code into "exit_code=$?" and
still clear "CLAUDE_CHILD_PID=\"\""; update the block that references claude_pid
and CLAUDE_CHILD_PID accordingly.
- Around line 1212-1249: The log currently prints the first 10 characters of
ANTHROPIC_FOUNDRY_API_KEY (via the log_status call), which risks leaking
secrets; change the log in the Foundry branch to avoid revealing key material —
instead log presence only (e.g., "API Key: set") or a masked form showing only
the last 4 characters (e.g., "API Key: ****1234"); update the log_status
invocation that references ANTHROPIC_FOUNDRY_API_KEY so it does not include the
first-10-chars expansion and ensure any masking logic uses only read-only
reporting (no key reconstruction) and is applied wherever
ANTHROPIC_FOUNDRY_API_KEY is logged.
- Around line 1337-1345: The wait on CLAUDE_CHILD_PID must be protected from set
-e so a non‑zero child exit (e.g. timeout 124) doesn't abort before capturing
exit_code; modify the block around wait/exit_code to temporarily disable errexit
(e.g. set +e; wait "$CLAUDE_CHILD_PID"; exit_code=$?; set -e) or use wait
"$CLAUDE_CHILD_PID" || true then capture exit_code=$? immediately after, and
ensure CLAUDE_CHILD_PID is referenced/quoted consistently; keep the subshell
pipeline unchanged so PIPESTATUS inside it still reports per-command failures.
In `@tests/unit/test_cli_modern.bats`:
- Around line 838-854: The test uses '\s' in the grep regex which is
non‑portable; update the grep that checks for the opening subshell to use a
POSIX portable character class, e.g. change the pattern "^\s*(" to
"^[[:space:]]*(" (keep the other greps unchanged), referencing the live_block
variable and the grep calls that currently assert the subshell start and
background pattern.
- Around line 856-869: The test currently checks for 'wait $CLAUDE_CHILD_PID'
and 'exit_code=$?' as separate patterns which lets the unsafe two-line form
pass; update the assertions to require the safe combined form by matching the
literal sequence "wait $CLAUDE_CHILD_PID || exit_code=$?" (or otherwise verify
those tokens appear on the same line) and keep the existing check for clearing
CLAUDE_CHILD_PID (CLAUDE_CHILD_PID="") so the test ensures the wait is safe
under set -e and that the pid is cleared afterward.
Summary
ANTHROPIC_FOUNDRY_*) from leaking across concurrent Ralph tmux sessions. WhenCLAUDE_CODE_USE_FOUNDRY != 1, actively unset Foundry vars at both the tmux session level(
forward_tmux_env) and the process environment level (personal account code path).CLAUDE_MODELsupport: AddCLAUDE_MODELconfig in.ralphrcthat gets passed as--modelto the Claude Code CLI, allowing users to override the default model (e.g.,CLAUDE_MODEL=opus).--livemode. Track child PIDs viaCLAUDE_CHILD_PIDand kill the entire process tree incleanup().( ... ) &+wait). This naturally isolates the pipeline's exit code fromset -e— a cleaner approach than theset +e/set -etoggle,which is brittle and the root cause of bugs like bug: set -e causes script termination on API limit prompt timeout (read -t 30) #185.
Context
When running two Ralph projects simultaneously in tmux (e.g., one company project with Azure Foundry, one personal with Claude Max), the Foundry credentials leak from the first session into the second because
tmuxcaptures the parent shell's fullenvironment as the global tmux environment.
The
set -eapproach for handling pipeline exit codes (Issue #175) is fragile — any new code path that returns non-zero (likeread -t 30in #185) silently kills the script. The subshell approach solves both the errexit problem and the Ctrl+C orphanproblem in one pattern.
Changes
ralph_loop.sh: Foundry env isolation,CLAUDE_MODELpassthrough,CLAUDE_CHILD_PIDcleanup, subshell pipelinetests/unit/test_cli_modern.bats: Updated 2 tests to match subshell approach (replacesset +e/set -eassertions)Test plan
test_integrity_check.batsunrelated to this PR —wait_for_reset'swhile trueappears before the integrity check)CLAUDE_MODEL=opusin.ralphrc— verify--model opusappears in CLI args--livemode — verify no orphaned Claude processesCloses #185 (related — subshell approach eliminates the class of
set -ebugs)Summary by CodeRabbit
New Features
Bug Fixes
Tests