Skip to content

fix(loop): Foundry credential isolation, model override, and clean Ctrl+C shutdown#186

Closed
tudoanh wants to merge 1 commit intofrankbria:mainfrom
tudoanh:fix/foundry-isolation-model-override-ctrlc
Closed

fix(loop): Foundry credential isolation, model override, and clean Ctrl+C shutdown#186
tudoanh wants to merge 1 commit intofrankbria:mainfrom
tudoanh:fix/foundry-isolation-model-override-ctrlc

Conversation

@tudoanh
Copy link

@tudoanh tudoanh commented Feb 18, 2026

Summary

  • Foundry credential isolation: Prevent Azure Foundry credentials (ANTHROPIC_FOUNDRY_*) from leaking across concurrent Ralph tmux sessions. When CLAUDE_CODE_USE_FOUNDRY != 1, actively unset Foundry vars at both the tmux session level
    (forward_tmux_env) and the process environment level (personal account code path).
  • CLAUDE_MODEL support: Add CLAUDE_MODEL config in .ralphrc that gets passed as --model to the Claude Code CLI, allowing users to override the default model (e.g., CLAUDE_MODEL=opus).
  • Clean Ctrl+C shutdown: Fix orphaned Claude processes when pressing Ctrl+C in --live mode. Track child PIDs via CLAUDE_CHILD_PID and kill the entire process tree in cleanup().
  • Subshell pipeline (improves on set -e + set -o pipefail causes silent script death on Claude timeout #175): Wrap the live-mode streaming pipeline in a backgrounded subshell (( ... ) & + wait). This naturally isolates the pipeline's exit code from set -e — a cleaner approach than the set +e/set -e toggle,
    which is brittle and the root cause of bugs like bug: set -e causes script termination on API limit prompt timeout (read -t 30) #185.

Context

When running two Ralph projects simultaneously in tmux (e.g., one company project with Azure Foundry, one personal with Claude Max), the Foundry credentials leak from the first session into the second because tmux captures the parent shell's full
environment as the global tmux environment.

The set -e approach for handling pipeline exit codes (Issue #175) is fragile — any new code path that returns non-zero (like read -t 30 in #185) silently kills the script. The subshell approach solves both the errexit problem and the Ctrl+C orphan
problem in one pattern.

Changes

  • ralph_loop.sh: Foundry env isolation, CLAUDE_MODEL passthrough, CLAUDE_CHILD_PID cleanup, subshell pipeline
  • tests/unit/test_cli_modern.bats: Updated 2 tests to match subshell approach (replaces set +e/set -e assertions)

Test plan

  • All 548 tests pass (547 pass, 1 pre-existing failure in test_integrity_check.bats unrelated to this PR — wait_for_reset's while true appears before the integrity check)
  • Start Ralph for Foundry project — verify it uses Azure Foundry credentials
  • Start Ralph for personal project concurrently — verify no Foundry credential leakage
  • Set CLAUDE_MODEL=opus in .ralphrc — verify --model opus appears in CLI args
  • Press Ctrl+C during --live mode — verify no orphaned Claude processes

Closes #185 (related — subshell approach eliminates the class of set -e bugs)

Summary by CodeRabbit

  • New Features

    • Specify Claude model per session and enable Chrome integration toggle.
    • Forward configurable tmux environment entries (including wildcards) into sessions.
    • Foundry vs personal-account credential selection with detection and export/unset behavior.
    • Session resume support and interactive handling for long (5‑hour) API limits.
  • Bug Fixes

    • Improved Claude process tracking and subtree cleanup.
    • Live-streaming runs in isolated background subshells for robust timeout and cleanup.
    • Circuit-breaker now polls for recovery instead of immediate aborts; status/log improvements.
  • Tests

    • Updated tests for subshell/background execution, wait-based exit handling, and timeout logging.

@macroscopeapp
Copy link
Contributor

macroscopeapp bot commented Feb 18, 2026

Add Foundry credential isolation, model override, Chrome flag, tmux env forwarding, circuit-breaker recovery wait, and reliable Ctrl+C shutdown to ralph_loop.sh

ralph_loop.sh adds Foundry/Azure Anthropic env handling with isolation, model and --chrome overrides in build_claude_command, tmux env forwarding via forward_tmux_env, loop auto-restart and flag propagation in setup_tmux_session, circuit-breaker recovery via wait_for_circuit_recovery, PID-tracked execution and cleanup in execute_claude_code with CLAUDE_CHILD_PID, and hard kill of child processes in cleanup. Tests update in tests/unit/test_cli_modern.bats.

📍Where to Start

Start with the CLI flow in main and execute_claude_code in ralph_loop.sh, then review forward_tmux_env, setup_tmux_session, and wait_for_circuit_recovery for control flow and env handling.


Macroscope summarized a0cb953.

ralph_loop.sh Outdated
if [[ -n "$CLAUDE_CHILD_PID" ]]; then
# Kill the entire process group rooted at CLAUDE_CHILD_PID
# This catches: timeout -> stdbuf -> claude (and pipeline peers in live mode)
kill -- -"$CLAUDE_CHILD_PID" 2>/dev/null || true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 High

ralph_loop.sh:1579 kill -- -"$CLAUDE_CHILD_PID" assumes the PID equals a PGID, but backgrounded subshells in non-interactive scripts inherit the parent's process group. This silently fails (ESRCH) or kills unrelated processes. Consider using pkill -P to kill by parent PID, or setsid when spawning to create a new session/process group.

🚀 Want me to fix this? Reply ex: "fix it for me".

🤖 Prompt for AI
In file ralph_loop.sh around line 1579:

`kill -- -"$CLAUDE_CHILD_PID"` assumes the PID equals a PGID, but backgrounded subshells in non-interactive scripts inherit the parent's process group. This silently fails (ESRCH) or kills unrelated processes. Consider using `pkill -P` to kill by parent PID, or `setsid` when spawning to create a new session/process group.


[[ -z "$patterns" ]] && return 0

for pattern in $patterns; do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium

ralph_loop.sh:203 Unquoted $patterns causes glob expansion—if files match ANTHROPIC_*, they replace the pattern. Consider wrapping the loop with set -f / set +f to disable globbing.

🚀 Want me to fix this? Reply ex: "fix it for me".

🤖 Prompt for AI
In file ralph_loop.sh around line 203:

Unquoted `$patterns` causes glob expansion—if files match `ANTHROPIC_*`, they replace the pattern. Consider wrapping the loop with `set -f` / `set +f` to disable globbing.

Comment on lines 177 to +179
[[ -n "$_env_CB_COOLDOWN_MINUTES" ]] && CB_COOLDOWN_MINUTES="$_env_CB_COOLDOWN_MINUTES"
[[ -n "$_env_CB_AUTO_RESET" ]] && CB_AUTO_RESET="$_env_CB_AUTO_RESET"
[[ -n "$_env_RALPH_TMUX_ENV" ]] && RALPH_TMUX_ENV="$_env_RALPH_TMUX_ENV"
[[ -n "$_env_CLAUDE_MODEL" ]] && CLAUDE_MODEL="$_env_CLAUDE_MODEL"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium

ralph_loop.sh:177 Missing restoration of _env_CLAUDE_CODE_USE_FOUNDRY and _env_ANTHROPIC_* variables. These are captured earlier but not restored here, so .ralphrc values can override user environment settings, violating the documented precedence. Consider adding restoration lines for these Foundry-related variables.

-    [[ -n "$_env_CLAUDE_MODEL" ]] && CLAUDE_MODEL="$_env_CLAUDE_MODEL"
+    [[ -n "$_env_CLAUDE_MODEL" ]] && CLAUDE_MODEL="$_env_CLAUDE_MODEL"
+    [[ -n "$_env_CLAUDE_CODE_USE_FOUNDRY" ]] && CLAUDE_CODE_USE_FOUNDRY="$_env_CLAUDE_CODE_USE_FOUNDRY"
+    [[ -n "$_env_ANTHROPIC_FOUNDRY_RESOURCE" ]] && ANTHROPIC_FOUNDRY_RESOURCE="$_env_ANTHROPIC_FOUNDRY_RESOURCE"
+    [[ -n "$_env_ANTHROPIC_FOUNDRY_API_KEY" ]] && ANTHROPIC_FOUNDRY_API_KEY="$_env_ANTHROPIC_FOUNDRY_API_KEY"

🚀 Want me to fix this? Reply ex: "fix it for me".

🤖 Prompt for AI
In file ralph_loop.sh around line 177:

Missing restoration of `_env_CLAUDE_CODE_USE_FOUNDRY` and `_env_ANTHROPIC_*` variables. These are captured earlier but not restored here, so `.ralphrc` values can override user environment settings, violating the documented precedence. Consider adding restoration lines for these Foundry-related variables.

@tudoanh tudoanh force-pushed the fix/foundry-isolation-model-override-ctrlc branch from e85290f to 3a20578 Compare February 18, 2026 11:06
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 18, 2026

Walkthrough

Adds tmux environment forwarding, a circuit-breaker cooldown polling flow, Foundry/Anthropic model/auth wiring, subshell-based live Claude execution with CLAUDE_CHILD_PID tracking and improved cleanup, plus corresponding unit-test updates to assert background/subshell and wait-based exit handling.

Changes

Cohort / File(s) Summary
Core CLI / Runtime
ralph_loop.sh
Adds forward_tmux_env() and integrates into session setup; adds wait_for_circuit_recovery() and replaces immediate OPEN aborts with polling; wires CLAUDE_MODEL, CLAUDE_CHROME, CLAUDE_CODE_USE_FOUNDRY, and ANTHROPIC_* envs; appends --model/--chrome to Claude CLI when set; implements session resume by session_id. Tracks Claude process with CLAUDE_CHILD_PID, runs live streaming inside a backgrounded subshell () &, captures exit via wait, and expands cleanup to kill Claude process subtrees and unset Foundry creds when appropriate.
Unit Tests (modern CLI)
tests/unit/test_cli_modern.bats
Updates live-mode tests to assert the live pipeline runs inside a backgrounded subshell and captures CLAUDE_CHILD_PID, verifies exit capture via wait and clearing of CLAUDE_CHILD_PID, and adds timeout/logging and stdin-redirection-related assertions replacing previous errexit/PIPESTATUS ordering checks.

Sequence Diagram(s)

sequenceDiagram
    participant Main as ralph_loop
    participant CB as CircuitBreaker
    participant Recovery as wait_for_circuit_recovery
    participant ClaudeCLI as Claude CLI

    Main->>CB: query circuit state
    CB-->>Main: OPEN
    Main->>Recovery: start polling loop
    Recovery->>CB: poll state (repeat)
    CB-->>Recovery: OPEN (repeat)
    CB-->>Recovery: HALF_OPEN/CLOSED
    Recovery-->>Main: recovered
    Main->>ClaudeCLI: resume/execute request (may include --model/--chrome)
    ClaudeCLI-->>Main: stream output / exit
Loading
sequenceDiagram
    participant Main as ralph_loop
    participant Builder as build_claude_command()
    participant Subshell as backgrounded subshell () &
    participant Claude as Claude CLI
    participant Reaper as cleanup / wait

    Main->>Builder: build command (includes CLAUDE_MODEL/CLAUDE_CHROME if set)
    Main->>Subshell: launch subshell with command
    Subshell->>Claude: exec Claude CLI (child processes)
    Subshell-->>Main: export CLAUDE_CHILD_PID (background)
    Main->>Reaper: wait $CLAUDE_CHILD_PID
    Reaper-->>Main: exit_code=$?
    Main->>Reaper: cleanup process tree using CLAUDE_CHILD_PID
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐰
I hop through envs and gently bind,
I wait when circuits lose their mind,
I launch in shells that hum and sigh,
I track the pids as time goes by,
A rabbit cheers: the loop runs fine!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title clearly and concisely summarizes the three main changes: Foundry credential isolation, model override support, and clean Ctrl+C shutdown, directly reflecting the core objectives.
Linked Issues check ✅ Passed The PR fully addresses the linked issue #185 by replacing brittle set +e/set -e toggles with a subshell pipeline approach that robustly handles read timeouts, and the updated tests validate this behavioral change.
Out of Scope Changes check ✅ Passed All changes directly support the stated objectives: tmux environment forwarding isolates Foundry credentials, CLAUDE_MODEL enables model override, CLAUDE_CHILD_PID tracks processes for clean shutdown, and subshell pipeline fixes set -e fragility.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
ralph_loop.sh (1)

1800-1804: ⚠️ Potential issue | 🟠 Major

Issue #185 (read -t 30 + set -e) appears unfixed here.

Line 1801 read -t 30 -n 1 user_choice returns non-zero on timeout, which under set -e exits the script — exactly the bug described in Issue #185. This path is reached when the API 5-hour limit is hit (exec_result -eq 2). While this line isn't changed in this PR, the PR references Issue #185 as the motivating problem.

Suggested fix
-            read -t 30 -n 1 user_choice
+            read -t 30 -n 1 user_choice || true
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ralph_loop.sh` around lines 1800 - 1804, The read with timeout (`read -t 30
-n 1 user_choice`) can return non-zero on timeout and cause the script to exit
under set -e; change the call so its non-zero exit doesn't abort the script (for
example, wrap it to ignore failures or capture its status), e.g. perform the
timed read in a conditional and set user_choice="" on timeout or use `|| true`
after the read so subsequent logic that checks user_choice (the existing if [[
"$user_choice" == "2" ]] || [[ -z "$user_choice" ]]) continues to run; update
the read invocation near where user_choice is used to ensure timeouts don't
trigger script termination.
🧹 Nitpick comments (4)
ralph_loop.sh (3)

1404-1406: LGTM — CLAUDE_CHILD_PID tracking in background mode enables Ctrl+C cleanup.

Note: the wait $claude_pid on line 1451 has the same set -e exposure as the live-mode path flagged above. Consider applying the same wait $claude_pid || exit_code=$? fix here as well for consistency, though this pattern may have been pre-existing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ralph_loop.sh` around lines 1404 - 1406, The background-mode path captures
the child PID into claude_pid and CLAUDE_CHILD_PID but still performs a bare
wait "wait $claude_pid" which will trigger set -e on failure; change the wait
logic used for the live-mode path to a safe form here as well (e.g., replace the
direct wait with "wait $claude_pid || exit_code=$?" style handling and
propagate/handle exit_code consistently), updating any downstream cleanup or
exit logic that inspects exit_code so background-mode exit handling mirrors
live-mode; reference variables/functions: claude_pid, CLAUDE_CHILD_PID, wait
$claude_pid, and exit_code.

1707-1709: Behavioral change: circuit breaker OPEN now blocks indefinitely instead of exiting.

Previously the loop would break on circuit breaker OPEN. Now it calls wait_for_circuit_recovery (unbounded) and continue. This is intentional per the PR but worth confirming this is the desired UX — operators may prefer a configurable max-wait or an exit after N failed recoveries.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ralph_loop.sh` around lines 1707 - 1709, The change makes the circuit-breaker
OPEN path call wait_for_circuit_recovery("$loop_count") and continue, which can
block indefinitely; change this to honor a configurable max wait or max recovery
attempts by adding a check around the log_status/wait_for_circuit_recovery block
(referencing log_status, wait_for_circuit_recovery, and loop_count) so that if a
new config variable (e.g., MAX_RECOVERY_ATTEMPTS or MAX_RECOVERY_WAIT) is
exceeded you log a final error and break/exit instead of continuing; implement
incrementing/tracking of recovery attempts (or elapsed wait) and use that to
decide between calling wait_for_circuit_recovery and breaking out of the loop.

1465-1474: Non-portable \| alternation in grep — use grep -E for reliability.

Line 1468 uses \| for alternation in basic regex, which is a GNU extension not supported by BSD grep (macOS). Since this is informational logging, the impact is low, but it's trivially fixable.

Proposed fix
-            if grep -q '"usage"\|"billing"\|"credits_remaining"' "$output_file" 2>/dev/null; then
+            if grep -qE '"usage"|"billing"|"credits_remaining"' "$output_file" 2>/dev/null; then
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ralph_loop.sh` around lines 1465 - 1474, The grep alternation using '\|' is
non-portable; update the check that reads "$output_file" (the grep command that
currently tests for '"usage"\|"billing"\|"credits_remaining"') to use extended
regex mode (e.g., grep -E or grep -P if available) with a single quoted
alternation like '("usage"|"billing"|"credits_remaining")', keep the -q and
2>/dev/null redirect, and leave the surrounding conditional logic unchanged (the
block referencing output_file, CLAUDE_CODE_USE_FOUNDRY, and log_status should
remain the same).
tests/unit/test_cli_modern.bats (1)

838-869: Add unit tests for forward_tmux_env() and wait_for_circuit_recovery() functions.

Both functions are in ralph_loop.sh but have no corresponding tests in this file. forward_tmux_env() handles glob pattern matching, prefix expansion, and conditional Foundry credential unsetting. wait_for_circuit_recovery() implements a polling loop with state checking and timestamp calculations. Both warrant unit test coverage per coding guidelines.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/test_cli_modern.bats` around lines 838 - 869, Add unit tests
covering forward_tmux_env() and wait_for_circuit_recovery(): for
forward_tmux_env() write cases that verify glob pattern matching and prefix
expansion produce expected environment variable exports and that when a variable
name matches the Foundry credential pattern it unsets
FOUNDRY_USERNAME/FOUNDRY_PASSWORD appropriately (including negative cases); for
wait_for_circuit_recovery() write tests that stub the dependencies/state checker
to simulate state transitions, verify the polling loop calls sleep as expected,
that timestamp/calculation logic yields correct backoff/timeout behavior, and
that success and timeout exit conditions set/return the correct codes—place
these tests alongside the existing live-mode tests and use the same sed/grep
pattern extraction approach to invoke and assert on the ralph_loop.sh functions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ralph_loop.sh`:
- Around line 1207-1209: The current code logs the first 10 characters of
ANTHROPIC_FOUNDRY_API_KEY via the log_status call, exposing sensitive secret
material; change the behavior in the block that checks ANTHROPIC_FOUNDRY_API_KEY
so it does not emit partial keys to $LOG_DIR/ralph.log — instead log only
presence/absence (e.g., "ANTHROPIC_FOUNDRY_API_KEY: set" or "not set") or, if
you must show a hint, mask to a minimal safe slice such as a single character or
last 4 chars; update the log_status invocation that references
ANTHROPIC_FOUNDRY_API_KEY accordingly to avoid writing key material to logs.
- Around line 50-61: The comment for the Foundry flag is misleading: it says to
set CLAUDE_CODE_USE_FOUNDRY=true but the script defaults
CLAUDE_CODE_USE_FOUNDRY="${CLAUDE_CODE_USE_FOUNDRY:-0}" and later checks for ==
"1"; update the comment to instruct users to set CLAUDE_CODE_USE_FOUNDRY=1 (or
alternatively modify the runtime check to accept "true" as well) so the behavior
of CLAUDE_CODE_USE_FOUNDRY, its default, and the check (referenced by
CLAUDE_CODE_USE_FOUNDRY and the runtime comparison around the Foundry check) are
consistent.
- Around line 509-543: The wait_for_circuit_recovery function lacks an upper
bound and documentation; update it to accept or use a MAX_WAIT_MINUTES (or
MAX_ITERATIONS) constant and enforce a maximum loop duration/iteration count
inside wait_for_circuit_recovery so that if get_circuit_state remains
"$CB_STATE_OPEN" beyond the limit it logs an error, updates status (e.g.,
"circuit_breaker_failed" or "timeout"), and returns a non-zero value; also add a
bash function comment block above wait_for_circuit_recovery describing purpose,
parameters (loop_count, optional max wait), and return values (0 on recovered,
non-zero on timeout/error), and reference functions/vars like
init_circuit_breaker, get_circuit_state, CB_COOLDOWN_MINUTES, CB_STATE_FILE,
parse_iso_to_epoch, and CALL_COUNT_FILE when implementing the timeout handling
and status updates.
- Around line 200-242: The function forward_tmux_env disables globbing with set
-f but can exit early (e.g., if tmux set-environment fails) leaving globbing
disabled; ensure set +f always runs by scoping noglob in a subshell or
installing a short-lived trap: either wrap the pattern-processing loop in a
subshell so set -f is local to that subshell, or add a trap (e.g., trap 'set +f'
RETURN or EXIT) immediately before set -f and clear it after set +f; apply this
to the forward_tmux_env function and ensure every tmux set-environment
invocation (the tmux set-environment calls) either is protected with || true or
is covered by the trap/subshell so set +f is restored on all exit paths.
- Around line 1319-1332: The wait on CLAUDE_CHILD_PID can trigger set -e when
the child exits non‑zero (e.g., timeout 124), so change the wait invocation to
capture the child's exit status without causing an immediate script exit: run
wait $CLAUDE_CHILD_PID with a fallthrough that assigns exit_code (e.g., use
"wait $CLAUDE_CHILD_PID || exit_code=$?") so the script records the actual exit
code for downstream timeout/session/circuit-breaker handling (references:
CLAUDE_CHILD_PID, wait, exit_code, portable_timeout).

---

Outside diff comments:
In `@ralph_loop.sh`:
- Around line 1800-1804: The read with timeout (`read -t 30 -n 1 user_choice`)
can return non-zero on timeout and cause the script to exit under set -e; change
the call so its non-zero exit doesn't abort the script (for example, wrap it to
ignore failures or capture its status), e.g. perform the timed read in a
conditional and set user_choice="" on timeout or use `|| true` after the read so
subsequent logic that checks user_choice (the existing if [[ "$user_choice" ==
"2" ]] || [[ -z "$user_choice" ]]) continues to run; update the read invocation
near where user_choice is used to ensure timeouts don't trigger script
termination.

---

Nitpick comments:
In `@ralph_loop.sh`:
- Around line 1404-1406: The background-mode path captures the child PID into
claude_pid and CLAUDE_CHILD_PID but still performs a bare wait "wait
$claude_pid" which will trigger set -e on failure; change the wait logic used
for the live-mode path to a safe form here as well (e.g., replace the direct
wait with "wait $claude_pid || exit_code=$?" style handling and propagate/handle
exit_code consistently), updating any downstream cleanup or exit logic that
inspects exit_code so background-mode exit handling mirrors live-mode; reference
variables/functions: claude_pid, CLAUDE_CHILD_PID, wait $claude_pid, and
exit_code.
- Around line 1707-1709: The change makes the circuit-breaker OPEN path call
wait_for_circuit_recovery("$loop_count") and continue, which can block
indefinitely; change this to honor a configurable max wait or max recovery
attempts by adding a check around the log_status/wait_for_circuit_recovery block
(referencing log_status, wait_for_circuit_recovery, and loop_count) so that if a
new config variable (e.g., MAX_RECOVERY_ATTEMPTS or MAX_RECOVERY_WAIT) is
exceeded you log a final error and break/exit instead of continuing; implement
incrementing/tracking of recovery attempts (or elapsed wait) and use that to
decide between calling wait_for_circuit_recovery and breaking out of the loop.
- Around line 1465-1474: The grep alternation using '\|' is non-portable; update
the check that reads "$output_file" (the grep command that currently tests for
'"usage"\|"billing"\|"credits_remaining"') to use extended regex mode (e.g.,
grep -E or grep -P if available) with a single quoted alternation like
'("usage"|"billing"|"credits_remaining")', keep the -q and 2>/dev/null redirect,
and leave the surrounding conditional logic unchanged (the block referencing
output_file, CLAUDE_CODE_USE_FOUNDRY, and log_status should remain the same).

In `@tests/unit/test_cli_modern.bats`:
- Around line 838-869: Add unit tests covering forward_tmux_env() and
wait_for_circuit_recovery(): for forward_tmux_env() write cases that verify glob
pattern matching and prefix expansion produce expected environment variable
exports and that when a variable name matches the Foundry credential pattern it
unsets FOUNDRY_USERNAME/FOUNDRY_PASSWORD appropriately (including negative
cases); for wait_for_circuit_recovery() write tests that stub the
dependencies/state checker to simulate state transitions, verify the polling
loop calls sleep as expected, that timestamp/calculation logic yields correct
backoff/timeout behavior, and that success and timeout exit conditions
set/return the correct codes—place these tests alongside the existing live-mode
tests and use the same sed/grep pattern extraction approach to invoke and assert
on the ralph_loop.sh functions.

Comment on lines +50 to +61
# Azure Foundry / Anthropic API configuration
# Set CLAUDE_CODE_USE_FOUNDRY=true in .ralphrc to use Azure Foundry for this project
_env_CLAUDE_CODE_USE_FOUNDRY="${CLAUDE_CODE_USE_FOUNDRY:-}"
_env_ANTHROPIC_FOUNDRY_RESOURCE="${ANTHROPIC_FOUNDRY_RESOURCE:-}"
_env_ANTHROPIC_FOUNDRY_API_KEY="${ANTHROPIC_FOUNDRY_API_KEY:-}"
_env_ANTHROPIC_DEFAULT_SONNET_MODEL="${ANTHROPIC_DEFAULT_SONNET_MODEL:-}"
_env_ANTHROPIC_DEFAULT_HAIKU_MODEL="${ANTHROPIC_DEFAULT_HAIKU_MODEL:-}"
_env_ANTHROPIC_DEFAULT_OPUS_MODEL="${ANTHROPIC_DEFAULT_OPUS_MODEL:-}"
_env_CLAUDE_MODEL="${CLAUDE_MODEL:-}"

# Default: don't use Foundry (use personal account)
CLAUDE_CODE_USE_FOUNDRY="${CLAUDE_CODE_USE_FOUNDRY:-0}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Inconsistent documentation: comment says true but code expects 1.

Line 51 says "Set CLAUDE_CODE_USE_FOUNDRY=true in .ralphrc" but line 61 defaults to "0" and the runtime check on line 1201 compares == "1". A user following the comment would set true, which would not match "1".

Proposed fix
 # Azure Foundry / Anthropic API configuration
-# Set CLAUDE_CODE_USE_FOUNDRY=true in .ralphrc to use Azure Foundry for this project
+# Set CLAUDE_CODE_USE_FOUNDRY=1 in .ralphrc to use Azure Foundry for this project
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ralph_loop.sh` around lines 50 - 61, The comment for the Foundry flag is
misleading: it says to set CLAUDE_CODE_USE_FOUNDRY=true but the script defaults
CLAUDE_CODE_USE_FOUNDRY="${CLAUDE_CODE_USE_FOUNDRY:-0}" and later checks for ==
"1"; update the comment to instruct users to set CLAUDE_CODE_USE_FOUNDRY=1 (or
alternatively modify the runtime check to accept "true" as well) so the behavior
of CLAUDE_CODE_USE_FOUNDRY, its default, and the check (referenced by
CLAUDE_CODE_USE_FOUNDRY and the runtime comparison around the Foundry check) are
consistent.

Comment on lines +200 to +242
# Forward environment variables into a tmux session
# Uses RALPH_TMUX_ENV to determine which vars to forward
# Supports exact names (e.g., "FOO") and prefix patterns (e.g., "ANTHROPIC_*")
forward_tmux_env() {
local session_name="$1"
local patterns="$RALPH_TMUX_ENV"

[[ -z "$patterns" ]] && return 0

# Disable globbing so patterns like "ANTHROPIC_*" are not expanded
# against filenames in the current directory
set -f
for pattern in $patterns; do
if [[ "$pattern" == *'*' ]]; then
# Prefix pattern: match all env vars starting with the prefix
local prefix="${pattern%\*}"
while IFS= read -r line; do
local name="${line%%=*}"
local value="${line#*=}"
if [[ -n "$name" && -n "$value" ]]; then
tmux set-environment -t "$session_name" "$name" "$value"
fi
done < <(env | grep "^${prefix}" || true)
else
# Exact match: forward if set in current environment
local value="${!pattern:-}"
if [[ -n "$value" ]]; then
tmux set-environment -t "$session_name" "$pattern" "$value"
fi
fi
done
set +f

# When Foundry is disabled, actively unset Foundry vars at session level
# to prevent leakage from tmux global environment (concurrent session isolation)
if [[ "${CLAUDE_CODE_USE_FOUNDRY:-0}" != "1" ]]; then
tmux set-environment -t "$session_name" -u ANTHROPIC_FOUNDRY_RESOURCE 2>/dev/null || true
tmux set-environment -t "$session_name" -u ANTHROPIC_FOUNDRY_API_KEY 2>/dev/null || true
tmux set-environment -t "$session_name" -u ANTHROPIC_DEFAULT_SONNET_MODEL 2>/dev/null || true
tmux set-environment -t "$session_name" -u ANTHROPIC_DEFAULT_HAIKU_MODEL 2>/dev/null || true
tmux set-environment -t "$session_name" -u ANTHROPIC_DEFAULT_OPUS_MODEL 2>/dev/null || true
fi
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

set -f is not restored if an early exit occurs between Lines 211 and 231.

Under set -e, if tmux set-environment (lines 220, 227) fails, the script exits with globbing still disabled. Consider using a trap or restructuring to ensure set +f runs on all exit paths.

Proposed fix — use a subshell to scope noglob
-    set -f
-    for pattern in $patterns; do
+    # Run in subshell so set -f (noglob) is automatically scoped
+    (
+    set -f
+    for pattern in $patterns; do
         if [[ "$pattern" == *'*' ]]; then
             ...
         fi
     done
-    set +f
+    )

Alternatively, add || true to the tmux set-environment calls on lines 220 and 227 to match the defensive style already used on lines 236–240.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Forward environment variables into a tmux session
# Uses RALPH_TMUX_ENV to determine which vars to forward
# Supports exact names (e.g., "FOO") and prefix patterns (e.g., "ANTHROPIC_*")
forward_tmux_env() {
local session_name="$1"
local patterns="$RALPH_TMUX_ENV"
[[ -z "$patterns" ]] && return 0
# Disable globbing so patterns like "ANTHROPIC_*" are not expanded
# against filenames in the current directory
set -f
for pattern in $patterns; do
if [[ "$pattern" == *'*' ]]; then
# Prefix pattern: match all env vars starting with the prefix
local prefix="${pattern%\*}"
while IFS= read -r line; do
local name="${line%%=*}"
local value="${line#*=}"
if [[ -n "$name" && -n "$value" ]]; then
tmux set-environment -t "$session_name" "$name" "$value"
fi
done < <(env | grep "^${prefix}" || true)
else
# Exact match: forward if set in current environment
local value="${!pattern:-}"
if [[ -n "$value" ]]; then
tmux set-environment -t "$session_name" "$pattern" "$value"
fi
fi
done
set +f
# When Foundry is disabled, actively unset Foundry vars at session level
# to prevent leakage from tmux global environment (concurrent session isolation)
if [[ "${CLAUDE_CODE_USE_FOUNDRY:-0}" != "1" ]]; then
tmux set-environment -t "$session_name" -u ANTHROPIC_FOUNDRY_RESOURCE 2>/dev/null || true
tmux set-environment -t "$session_name" -u ANTHROPIC_FOUNDRY_API_KEY 2>/dev/null || true
tmux set-environment -t "$session_name" -u ANTHROPIC_DEFAULT_SONNET_MODEL 2>/dev/null || true
tmux set-environment -t "$session_name" -u ANTHROPIC_DEFAULT_HAIKU_MODEL 2>/dev/null || true
tmux set-environment -t "$session_name" -u ANTHROPIC_DEFAULT_OPUS_MODEL 2>/dev/null || true
fi
}
# Forward environment variables into a tmux session
# Uses RALPH_TMUX_ENV to determine which vars to forward
# Supports exact names (e.g., "FOO") and prefix patterns (e.g., "ANTHROPIC_*")
forward_tmux_env() {
local session_name="$1"
local patterns="$RALPH_TMUX_ENV"
[[ -z "$patterns" ]] && return 0
# Run in subshell so set -f (noglob) is automatically scoped
(
set -f
for pattern in $patterns; do
if [[ "$pattern" == *'*' ]]; then
# Prefix pattern: match all env vars starting with the prefix
local prefix="${pattern%\*}"
while IFS= read -r line; do
local name="${line%%=*}"
local value="${line#*=}"
if [[ -n "$name" && -n "$value" ]]; then
tmux set-environment -t "$session_name" "$name" "$value"
fi
done < <(env | grep "^${prefix}" || true)
else
# Exact match: forward if set in current environment
local value="${!pattern:-}"
if [[ -n "$value" ]]; then
tmux set-environment -t "$session_name" "$pattern" "$value"
fi
fi
done
)
# When Foundry is disabled, actively unset Foundry vars at session level
# to prevent leakage from tmux global environment (concurrent session isolation)
if [[ "${CLAUDE_CODE_USE_FOUNDRY:-0}" != "1" ]]; then
tmux set-environment -t "$session_name" -u ANTHROPIC_FOUNDRY_RESOURCE 2>/dev/null || true
tmux set-environment -t "$session_name" -u ANTHROPIC_FOUNDRY_API_KEY 2>/dev/null || true
tmux set-environment -t "$session_name" -u ANTHROPIC_DEFAULT_SONNET_MODEL 2>/dev/null || true
tmux set-environment -t "$session_name" -u ANTHROPIC_DEFAULT_HAIKU_MODEL 2>/dev/null || true
tmux set-environment -t "$session_name" -u ANTHROPIC_DEFAULT_OPUS_MODEL 2>/dev/null || true
fi
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ralph_loop.sh` around lines 200 - 242, The function forward_tmux_env disables
globbing with set -f but can exit early (e.g., if tmux set-environment fails)
leaving globbing disabled; ensure set +f always runs by scoping noglob in a
subshell or installing a short-lived trap: either wrap the pattern-processing
loop in a subshell so set -f is local to that subshell, or add a trap (e.g.,
trap 'set +f' RETURN or EXIT) immediately before set -f and clear it after set
+f; apply this to the forward_tmux_env function and ensure every tmux
set-environment invocation (the tmux set-environment calls) either is protected
with || true or is covered by the trap/subshell so set +f is restored on all
exit paths.

Comment on lines +509 to +543
# Wait for circuit breaker cooldown recovery instead of exiting
wait_for_circuit_recovery() {
local loop_count=$1
local check_interval=60 # Check every 60 seconds

log_status "WARN" "Circuit breaker OPEN - waiting for cooldown recovery (${CB_COOLDOWN_MINUTES}m)..."
update_status "$loop_count" "$(cat "$CALL_COUNT_FILE" 2>/dev/null || echo "0")" "waiting_for_recovery" "paused" "circuit_breaker_cooldown"

while true; do
sleep "$check_interval"

# Re-init circuit breaker (handles OPEN→HALF_OPEN/CLOSED transitions)
init_circuit_breaker

local state
state=$(get_circuit_state)
if [[ "$state" != "$CB_STATE_OPEN" ]]; then
log_status "SUCCESS" "Circuit breaker recovered to $state - resuming loop"
return 0
fi

# Show remaining cooldown time
local opened_at
opened_at=$(jq -r '.opened_at // .last_change // ""' "$CB_STATE_FILE" 2>/dev/null)
if [[ -n "$opened_at" && "$opened_at" != "null" ]]; then
local opened_epoch elapsed_minutes remaining
opened_epoch=$(parse_iso_to_epoch "$opened_at")
elapsed_minutes=$(( ($(date +%s) - opened_epoch) / 60 ))
remaining=$((CB_COOLDOWN_MINUTES - elapsed_minutes))
if [[ $remaining -gt 0 ]]; then
log_status "INFO" "Cooldown: ${elapsed_minutes}m elapsed, ~${remaining}m remaining"
fi
fi
done
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

wait_for_circuit_recovery has no upper bound — a corrupted state file will loop forever.

If get_circuit_state always returns OPEN (e.g., corrupted state file, clock skew preventing cooldown expiry), this function never returns. Consider adding a maximum wait time or iteration count as a safety net.

Proposed safeguard
 wait_for_circuit_recovery() {
     local loop_count=$1
     local check_interval=60  # Check every 60 seconds
+    local max_wait_seconds=$(( (CB_COOLDOWN_MINUTES + 10) * 60 ))  # cooldown + 10min buffer
+    local waited=0
 
     log_status "WARN" "Circuit breaker OPEN - waiting for cooldown recovery (${CB_COOLDOWN_MINUTES}m)..."
     ...
 
     while true; do
         sleep "$check_interval"
+        waited=$((waited + check_interval))
+
+        if [[ $waited -ge $max_wait_seconds ]]; then
+            log_status "ERROR" "Recovery wait exceeded max time (${max_wait_seconds}s) - forcing circuit reset"
+            reset_circuit_breaker "recovery_timeout"
+            return 0
+        fi
 
         # Re-init circuit breaker ...

As per coding guidelines, **/*.sh: "Use bash function comments documenting purpose, parameters, and return values for all functions" — this function is also missing parameter/return documentation.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ralph_loop.sh` around lines 509 - 543, The wait_for_circuit_recovery function
lacks an upper bound and documentation; update it to accept or use a
MAX_WAIT_MINUTES (or MAX_ITERATIONS) constant and enforce a maximum loop
duration/iteration count inside wait_for_circuit_recovery so that if
get_circuit_state remains "$CB_STATE_OPEN" beyond the limit it logs an error,
updates status (e.g., "circuit_breaker_failed" or "timeout"), and returns a
non-zero value; also add a bash function comment block above
wait_for_circuit_recovery describing purpose, parameters (loop_count, optional
max wait), and return values (0 on recovered, non-zero on timeout/error), and
reference functions/vars like init_circuit_breaker, get_circuit_state,
CB_COOLDOWN_MINUTES, CB_STATE_FILE, parse_iso_to_epoch, and CALL_COUNT_FILE when
implementing the timeout handling and status updates.

Comment on lines +1207 to +1209
if [[ -n "${ANTHROPIC_FOUNDRY_API_KEY:-}" ]]; then
export ANTHROPIC_FOUNDRY_API_KEY
log_status "INFO" " API Key: ${ANTHROPIC_FOUNDRY_API_KEY:0:10}..."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Logging partial API key is a security risk.

Line 1209 logs the first 10 characters of ANTHROPIC_FOUNDRY_API_KEY. Even a partial key in log files (which persist on disk at $LOG_DIR/ralph.log) increases exposure surface. Consider masking to just a few characters or logging only presence/absence.

Proposed fix
         if [[ -n "${ANTHROPIC_FOUNDRY_API_KEY:-}" ]]; then
             export ANTHROPIC_FOUNDRY_API_KEY
-            log_status "INFO" "   API Key: ${ANTHROPIC_FOUNDRY_API_KEY:0:10}..."
+            log_status "INFO" "   API Key: ****${ANTHROPIC_FOUNDRY_API_KEY: -4}..."
         fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if [[ -n "${ANTHROPIC_FOUNDRY_API_KEY:-}" ]]; then
export ANTHROPIC_FOUNDRY_API_KEY
log_status "INFO" " API Key: ${ANTHROPIC_FOUNDRY_API_KEY:0:10}..."
if [[ -n "${ANTHROPIC_FOUNDRY_API_KEY:-}" ]]; then
export ANTHROPIC_FOUNDRY_API_KEY
log_status "INFO" " API Key: ****${ANTHROPIC_FOUNDRY_API_KEY: -4}..."
fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ralph_loop.sh` around lines 1207 - 1209, The current code logs the first 10
characters of ANTHROPIC_FOUNDRY_API_KEY via the log_status call, exposing
sensitive secret material; change the behavior in the block that checks
ANTHROPIC_FOUNDRY_API_KEY so it does not emit partial keys to $LOG_DIR/ralph.log
— instead log only presence/absence (e.g., "ANTHROPIC_FOUNDRY_API_KEY: set" or
"not set") or, if you must show a hint, mask to a minimal safe slice such as a
single character or last 4 chars; update the log_status invocation that
references ANTHROPIC_FOUNDRY_API_KEY accordingly to avoid writing key material
to logs.

Comment on lines +1319 to +1332
#
# Run in a subshell so cleanup() can kill the entire pipeline
# (timeout + stdbuf + claude + tee + jq) with a single process group
# kill on Ctrl+C. The subshell also avoids the errexit issue (Issue #175)
# where timeout's exit code 124 would silently kill the script under set -e.
(
set -o pipefail
portable_timeout ${timeout_seconds}s stdbuf -oL "${LIVE_CMD_ARGS[@]}" \
< /dev/null 2>&1 | stdbuf -oL tee "$output_file" | stdbuf -oL jq --unbuffered -j "$jq_filter" 2>/dev/null | tee "$LIVE_LOG_FILE"
) &
CLAUDE_CHILD_PID=$!
wait $CLAUDE_CHILD_PID
exit_code=$?
CLAUDE_CHILD_PID=""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Test 1: Verify bash set -e behavior with wait returning non-zero
echo "=== Test 1: set -e + wait returning 124 ==="
bash -c '
set -e
( exit 124 ) &
pid=$!
echo "STARTING"
wait $pid
echo "REACHED exit_code=$?"
' 2>&1 || echo "(script exited with code $?)"

echo ""
echo "=== Test 2: set -e + wait || exit_code=$? (proposed fix) ==="
bash -c '
set -e
exit_code=0
( exit 124 ) &
pid=$!
echo "STARTING"
wait $pid || exit_code=$?
echo "REACHED exit_code=$exit_code"
' 2>&1

echo ""
echo "=== Test 3: Testing inside a function with set -e ==="
bash -c '
set -e
test_func() {
  ( exit 124 ) &
  pid=$!
  echo "IN FUNC STARTING"
  wait $pid
  echo "IN FUNC REACHED"
  return 0
}
test_func
echo "CALLER REACHED"
' 2>&1 || echo "(function/script exited with code $?)"

Repository: frankbria/ralph-claude-code

Length of output: 356


🏁 Script executed:

# Now read the actual code context
head -c 50000 ralph_loop.sh | tail -c 5000 | cat -n

Repository: frankbria/ralph-claude-code

Length of output: 5859


🏁 Script executed:

# Get broader context around lines 1319-1332
sed -n '1300,1360p' ralph_loop.sh | cat -n

Repository: frankbria/ralph-claude-code

Length of output: 3826


🏁 Script executed:

# Check how execute_claude_code is called and what the function returns
sed -n '1770,1800p' ralph_loop.sh | cat -n

Repository: frankbria/ralph-claude-code

Length of output: 1834


🏁 Script executed:

# Check the start of execute_claude_code to see set -e status
sed -n '1200,1250p' ralph_loop.sh | cat -n

Repository: frankbria/ralph-claude-code

Length of output: 2713


🏁 Script executed:

# Check line 6 to confirm set -e is enabled
sed -n '1,20p' ralph_loop.sh | cat -n

Repository: frankbria/ralph-claude-code

Length of output: 984


wait $CLAUDE_CHILD_PID is subject to set -e — timeout (exit 124) terminates the script before timeout handling.

Under set -e (line 6), when wait $CLAUDE_CHILD_PID returns non-zero (timeout exit code 124), the script exits immediately at line 1331, never reaching exit_code=$? on line 1331. This bypasses all downstream timeout handling (line 1336), session extraction (line 1337+), and circuit breaker logic, defeating the fix for Issue #175.

Proposed fix — capture non-zero wait status without triggering errexit
         ) &
         CLAUDE_CHILD_PID=$!
-        wait $CLAUDE_CHILD_PID
-        exit_code=$?
+        wait $CLAUDE_CHILD_PID || exit_code=$?
         CLAUDE_CHILD_PID=""

With || exit_code=$?, the || operator ensures the compound command succeeds (preventing set -e exit), while capturing the actual exit code for proper handling downstream.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ralph_loop.sh` around lines 1319 - 1332, The wait on CLAUDE_CHILD_PID can
trigger set -e when the child exits non‑zero (e.g., timeout 124), so change the
wait invocation to capture the child's exit status without causing an immediate
script exit: run wait $CLAUDE_CHILD_PID with a fallthrough that assigns
exit_code (e.g., use "wait $CLAUDE_CHILD_PID || exit_code=$?") so the script
records the actual exit code for downstream timeout/session/circuit-breaker
handling (references: CLAUDE_CHILD_PID, wait, exit_code, portable_timeout).

@tudoanh tudoanh force-pushed the fix/foundry-isolation-model-override-ctrlc branch from 3a20578 to c60b5a9 Compare February 18, 2026 12:31
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (2)
ralph_loop.sh (1)

200-203: forward_tmux_env is missing a bash function documentation block.

Per coding guidelines, all *.sh functions must have a comment block documenting purpose, parameters, and return values. As per coding guidelines, "Use bash function comments documenting purpose, parameters, and return values for all functions".

📄 Suggested doc comment
+# forward_tmux_env - Forward configured environment variables into a tmux session.
+#
+# Reads RALPH_TMUX_ENV for a space-separated list of env var names or prefix
+# patterns (trailing '*'). Forwards matching variables via `tmux set-environment`.
+# When Foundry is disabled, actively unsets Foundry vars to prevent leakage
+# across concurrent sessions.
+#
+# Parameters:
+#   $1 - session_name: tmux session target name
+#
+# Returns: 0 always (individual tmux failures are suppressed with || true)
 forward_tmux_env() {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ralph_loop.sh` around lines 200 - 203, Add a bash function documentation
block above the forward_tmux_env function that states the function's purpose
(forward specified environment variables into a tmux session), parameters (reads
RALPH_TMUX_ENV containing exact names and prefix patterns like "ANTHROPIC_*"),
and return behavior/exit codes (e.g., returns 0 on success, non‑zero on error).
Ensure the comment follows the project's bash doc style (purpose, parameters,
returns) and briefly notes behavior for exact names vs. wildcard prefixes and
any side effects (modifies tmux-environment using tmux set-environment).
tests/unit/test_cli_modern.bats (1)

838-869: Missing unit tests for forward_tmux_env, wait_for_circuit_recovery, and CLAUDE_MODEL passthrough.

This PR introduces two new public functions (forward_tmux_env, wait_for_circuit_recovery) and adds --model injection to build_claude_command. None of these have corresponding unit tests. Additionally, the inline build_claude_command copy in the setup() block (lines ~451–499) omits the new --model branch, so the existing build_claude_command tests do not exercise the model-override path at all.

As per coding guidelines, "Add unit tests for all new bash script functions using bats test framework."

For forward_tmux_env, tmux can be mocked; for wait_for_circuit_recovery, sleep and get_circuit_state can be stubbed. At minimum, add a test such as:

`@test` "build_claude_command includes --model when CLAUDE_MODEL is set" {
    export CLAUDE_CODE_CMD="claude"
    export CLAUDE_OUTPUT_FORMAT="json"
    export CLAUDE_ALLOWED_TOOLS=""
    export CLAUDE_USE_CONTINUE="false"
    export CLAUDE_MODEL="claude-opus-4-6"

    echo "Test prompt" > "$PROMPT_FILE"
    # Call the real implementation (not the stale inline copy)
    source "${BATS_TEST_DIRNAME}/../../ralph_loop.sh" --help 2>/dev/null || true
    build_claude_command "$PROMPT_FILE" "" ""

    local cmd_string="${CLAUDE_CMD_ARGS[*]}"
    [[ "$cmd_string" == *"--model"* ]]
    [[ "$cmd_string" == *"claude-opus-4-6"* ]]
}

`@test` "build_claude_command omits --model when CLAUDE_MODEL is empty" {
    export CLAUDE_MODEL=""
    echo "Test prompt" > "$PROMPT_FILE"
    build_claude_command "$PROMPT_FILE" "" ""

    local cmd_string="${CLAUDE_CMD_ARGS[*]}"
    [[ "$cmd_string" != *"--model"* ]]
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/test_cli_modern.bats` around lines 838 - 869, Add bats unit tests
covering the new public functions and the CLAUDE_MODEL passthrough: write tests
for forward_tmux_env (mock tmux environment variables and assert they are
forwarded), for wait_for_circuit_recovery (stub sleep and get_circuit_state to
simulate transient states and assert correct retry/exit behavior), and for
build_claude_command to assert it includes "--model <value>" when CLAUDE_MODEL
is set and omits it when empty; also update the stale inline copy of
build_claude_command in the setup() block so it either sources/uses the real
build_claude_command implementation (rather than the outdated copy) or add the
missing --model branch so tests exercise the model-override path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ralph_loop.sh`:
- Around line 1449-1451: In the background execution path where we call wait on
the Claude PID (wait $claude_pid), prevent set -e from aborting before we record
the child's exit status: temporarily disable error-exit or absorb wait's failure
so we can assign exit_code=$? after wait (e.g., run wait with an || true or wrap
in set +e / set -e), then use that exit_code to detect portable_timeout (124)
and run the same timeout handling, circuit breaker and session extraction logic;
update the block that currently contains wait $claude_pid and
CLAUDE_CHILD_PID="" to ensure wait's exit code is captured into exit_code
regardless of set -e.
- Around line 1463-1472: The grep pattern uses GNU-only BRE alternation (\|) and
fails on BSD/macOS; update the authentication detection in the block that checks
"$output_file" by replacing the grep call in that if-statement (the line that
currently uses grep -q '"usage"\|"billing"\|"credits_remaining"') with a
portable ERE invocation (e.g. grep -E or egrep) or equivalent multiple -e
patterns, preserve the quiet flag and redirection (2>/dev/null), so the
condition correctly matches "usage", "billing" or "credits_remaining" across
platforms; keep the surrounding logic referencing output_file,
CLAUDE_CODE_USE_FOUNDRY and log_status unchanged.
- Line 1459: The echo that writes to PROGRESS_FILE uses an unquoted command
substitution which can be split on the space in the timestamp; update the write
so the timestamp substitution is quoted (e.g. use "$(date '+%Y-%m-%d
%H:%M:%S')") or switch to printf to build the JSON safely before redirecting to
"$PROGRESS_FILE" — target the echo line that writes to PROGRESS_FILE in
ralph_loop.sh and ensure the substitution is quoted to prevent word splitting.

In `@tests/unit/test_cli_modern.bats`:
- Around line 838-854: In the test function test "live mode pipeline runs in
subshell to isolate errexit (Issue `#175`)" replace GNU-only '\s' usage in the
grep that checks for an indented opening parenthesis so it becomes
POSIX-portable: update the grep against the live_block (the line echo
"$live_block" | grep -q '^\s*(') to use [[:space:]] (e.g. '^[[:space:]]*(') so
macOS/BSD grep behaves correctly; leave the other checks (') &' and
'CLAUDE_CHILD_PID=\$!') unchanged.
- Around line 856-869: The test "live mode pipeline captures exit code via wait"
currently checks for separate lines matching wait \$CLAUDE_CHILD_PID and
exit_code=\$? which allows the buggy two-line form; change the assertions to
require the compound form using the OR operator so the wait and exit capture
appear together (e.g. assert that the live_block contains a pattern where wait
\$CLAUDE_CHILD_PID and \|\| and exit_code=\$? occur in the same command); keep
the existing check that CLAUDE_CHILD_PID is cleared (CLAUDE_CHILD_PID="") and
update the grep/regex used to match the live_block accordingly so only the safe
"wait ... || exit_code=$?" form passes.

---

Duplicate comments:
In `@ralph_loop.sh`:
- Around line 209-231: The loop that sets tmux environment vars disables
globbing with set -f but can exit early if tmux set-environment fails (under set
-e), leaving globbing disabled; update the block that calls tmux set-environment
(both the prefix branch and the exact-match branch referencing variables
patterns and session_name and using tmux set-environment) to either append ||
true to the tmux commands or ensure set +f always runs (e.g., via a
trap/finally), so failures in tmux set-environment do not prevent restoring
globbing.
- Around line 1205-1209: The script is leaking a partial
ANTHROPIC_FOUNDRY_API_KEY value to logs via the log_status call; update the
block that checks ANTHROPIC_FOUNDRY_API_KEY so it does not print the key
contents (in ralph_loop.sh where ANTHROPIC_FOUNDRY_API_KEY and log_status are
used). Replace the current log_status that prints
${ANTHROPIC_FOUNDRY_API_KEY:0:10} with a non-secret message (e.g., "API key set"
or a fully masked representation like "*******" or only show a fixed-length
masked suffix/prefix) so no actual key material is emitted to logs.
- Around line 507-541: Add a timeout guard and docstring to
wait_for_circuit_recovery: document the function and its parameters, then add an
optional max_wait_seconds parameter (or default like CB_COOLDOWN_MINUTES*60*2)
and track elapsed time inside the while loop; after each sleep/update/re-init
(calls to init_circuit_breaker and get_circuit_state) check elapsed against
max_wait_seconds and if exceeded log_status an ERROR, update_status to a timeout
state, and return non-zero; keep existing cooldown logging (opened_at parsing
via parse_iso_to_epoch and CB_STATE_FILE) but ensure the loop breaks on timeout
to avoid infinite hang.
- Around line 50-51: The comment and runtime check for the environment flag
CLAUDE_CODE_USE_FOUNDRY are inconsistent (comment says set to "true" but the
script checks for "1"); make them consistent by either updating the script's
runtime conditional to accept "true" (e.g., case-insensitive check for
"true"/"1"/"yes") or changing the comment to instruct setting
CLAUDE_CODE_USE_FOUNDRY="1"; update the CLAUDE_CODE_USE_FOUNDRY check and the
comment in ralph_loop.sh (and any related README/.ralphrc note) so both use the
same canonical value and document accepted truthy forms.
- Around line 1322-1330: The wait invocation is still vulnerable to set -e:
change the line using wait on CLAUDE_CHILD_PID so the script captures the
child's exit status even if wait returns non-zero (e.g., timeout). Replace the
bare "wait $CLAUDE_CHILD_PID" with a guarded form such as "wait
$CLAUDE_CHILD_PID || exit_code=$?" (or an equivalent if-block) to ensure
exit_code is assigned when wait fails, keeping the existing CLAUDE_CHILD_PID and
exit_code variables intact.

---

Nitpick comments:
In `@ralph_loop.sh`:
- Around line 200-203: Add a bash function documentation block above the
forward_tmux_env function that states the function's purpose (forward specified
environment variables into a tmux session), parameters (reads RALPH_TMUX_ENV
containing exact names and prefix patterns like "ANTHROPIC_*"), and return
behavior/exit codes (e.g., returns 0 on success, non‑zero on error). Ensure the
comment follows the project's bash doc style (purpose, parameters, returns) and
briefly notes behavior for exact names vs. wildcard prefixes and any side
effects (modifies tmux-environment using tmux set-environment).

In `@tests/unit/test_cli_modern.bats`:
- Around line 838-869: Add bats unit tests covering the new public functions and
the CLAUDE_MODEL passthrough: write tests for forward_tmux_env (mock tmux
environment variables and assert they are forwarded), for
wait_for_circuit_recovery (stub sleep and get_circuit_state to simulate
transient states and assert correct retry/exit behavior), and for
build_claude_command to assert it includes "--model <value>" when CLAUDE_MODEL
is set and omits it when empty; also update the stale inline copy of
build_claude_command in the setup() block so it either sources/uses the real
build_claude_command implementation (rather than the outdated copy) or add the
missing --model branch so tests exercise the model-override path.

echo "$calls_made" > "$CALL_COUNT_FILE"

# Clear progress file
echo '{"status": "completed", "timestamp": "'$(date '+%Y-%m-%d %H:%M:%S')'"}' > "$PROGRESS_FILE"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

SC2046: unquoted $(date) in echo — progress file write.

Shellcheck warns that the command substitution $(date '+%Y-%m-%d %H:%M:%S') is unquoted; word splitting on the space in the timestamp can produce two echo arguments. While echo concatenates them with a space and the JSON output is still valid today, it is brittle and flagged by the static analysis tool.

🔧 Proposed fix
-        echo '{"status": "completed", "timestamp": "'$(date '+%Y-%m-%d %H:%M:%S')'"}' > "$PROGRESS_FILE"
+        echo '{"status": "completed", "timestamp": "'"$(date '+%Y-%m-%d %H:%M:%S')"'"}' > "$PROGRESS_FILE"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
echo '{"status": "completed", "timestamp": "'$(date '+%Y-%m-%d %H:%M:%S')'"}' > "$PROGRESS_FILE"
echo '{"status": "completed", "timestamp": "'"$(date '+%Y-%m-%d %H:%M:%S')"'"}' > "$PROGRESS_FILE"
🧰 Tools
🪛 Shellcheck (0.11.0)

[warning] 1459-1459: Quote this to prevent word splitting.

(SC2046)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ralph_loop.sh` at line 1459, The echo that writes to PROGRESS_FILE uses an
unquoted command substitution which can be split on the space in the timestamp;
update the write so the timestamp substitution is quoted (e.g. use "$(date
'+%Y-%m-%d %H:%M:%S')") or switch to printf to build the JSON safely before
redirecting to "$PROGRESS_FILE" — target the echo line that writes to
PROGRESS_FILE in ralph_loop.sh and ensure the substitution is quoted to prevent
word splitting.

Comment on lines +1463 to +1472
# Verify actual authentication method used by checking response metadata
if [[ -f "$output_file" ]]; then
# Check for usage/billing fields (indicates personal account)
if grep -q '"usage"\|"billing"\|"credits_remaining"' "$output_file" 2>/dev/null; then
log_status "INFO" "🔍 Detected: Personal Account (usage/billing info in response)"
elif [[ "${CLAUDE_CODE_USE_FOUNDRY:-0}" == "1" ]]; then
# Foundry responses typically don't have usage fields
log_status "INFO" "🔍 Detected: Azure Foundry (no usage info in response)"
fi
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

grep -q '"usage"\|...' uses GNU BRE alternation — not portable to macOS/BSD grep.

\| as alternation is a GNU grep extension to BRE. BSD grep (macOS default) treats \| as a literal two-character sequence, so the pattern never matches on macOS. The authentication-detection logging would silently be skipped on non-Linux hosts.

🔧 Portable fix using ERE
-        if grep -q '"usage"\|"billing"\|"credits_remaining"' "$output_file" 2>/dev/null; then
+        if grep -qE '"usage"|"billing"|"credits_remaining"' "$output_file" 2>/dev/null; then
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Verify actual authentication method used by checking response metadata
if [[ -f "$output_file" ]]; then
# Check for usage/billing fields (indicates personal account)
if grep -q '"usage"\|"billing"\|"credits_remaining"' "$output_file" 2>/dev/null; then
log_status "INFO" "🔍 Detected: Personal Account (usage/billing info in response)"
elif [[ "${CLAUDE_CODE_USE_FOUNDRY:-0}" == "1" ]]; then
# Foundry responses typically don't have usage fields
log_status "INFO" "🔍 Detected: Azure Foundry (no usage info in response)"
fi
fi
# Verify actual authentication method used by checking response metadata
if [[ -f "$output_file" ]]; then
# Check for usage/billing fields (indicates personal account)
if grep -qE '"usage"|"billing"|"credits_remaining"' "$output_file" 2>/dev/null; then
log_status "INFO" "🔍 Detected: Personal Account (usage/billing info in response)"
elif [[ "${CLAUDE_CODE_USE_FOUNDRY:-0}" == "1" ]]; then
# Foundry responses typically don't have usage fields
log_status "INFO" "🔍 Detected: Azure Foundry (no usage info in response)"
fi
fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ralph_loop.sh` around lines 1463 - 1472, The grep pattern uses GNU-only BRE
alternation (\|) and fails on BSD/macOS; update the authentication detection in
the block that checks "$output_file" by replacing the grep call in that
if-statement (the line that currently uses grep -q
'"usage"\|"billing"\|"credits_remaining"') with a portable ERE invocation (e.g.
grep -E or egrep) or equivalent multiple -e patterns, preserve the quiet flag
and redirection (2>/dev/null), so the condition correctly matches "usage",
"billing" or "credits_remaining" across platforms; keep the surrounding logic
referencing output_file, CLAUDE_CODE_USE_FOUNDRY and log_status unchanged.

Comment on lines +838 to 854
@test "live mode pipeline runs in subshell to isolate errexit (Issue #175)" {
# The live-mode pipeline runs in a backgrounded subshell so that
# timeout's exit code 124 never triggers set -e in the parent.
# This also enables Ctrl+C cleanup via CLAUDE_CHILD_PID.
local script="${BATS_TEST_DIRNAME}/../../ralph_loop.sh"

# Extract only the live-mode section (from "Live output mode enabled" to "End of Output")
local live_block
live_block=$(sed -n '/Live output mode enabled/,/End of Output/p' "$script")

# set +e must appear before set -o pipefail within the live-mode block
echo "$live_block" | grep -q 'set +e'
echo "$live_block" | grep -q 'set -o pipefail'
# Pipeline must be inside a subshell (parentheses) that is backgrounded
echo "$live_block" | grep -q '^\s*('
echo "$live_block" | grep -q ') &'

# Verify ordering: set +e comes first
local plus_e_line=$(echo "$live_block" | grep -n 'set +e' | head -1 | cut -d: -f1)
local pipefail_line=$(echo "$live_block" | grep -n 'set -o pipefail' | head -1 | cut -d: -f1)

[[ -n "$plus_e_line" ]]
[[ -n "$pipefail_line" ]]
[[ $plus_e_line -lt $pipefail_line ]]
# CLAUDE_CHILD_PID must be captured for cleanup
echo "$live_block" | grep -q 'CLAUDE_CHILD_PID=\$!'
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

grep -q '^\s*(' uses \s — not portable to BSD grep (macOS).

\s in BRE is a GNU grep extension; BSD grep treats it as a literal two-character sequence. The test could silently pass for the wrong reason on macOS CI runners. Use POSIX [[:space:]] instead.

🔧 Portable fix
-    echo "$live_block" | grep -q '^\s*('
+    echo "$live_block" | grep -q '^[[:space:]]*('
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@test "live mode pipeline runs in subshell to isolate errexit (Issue #175)" {
# The live-mode pipeline runs in a backgrounded subshell so that
# timeout's exit code 124 never triggers set -e in the parent.
# This also enables Ctrl+C cleanup via CLAUDE_CHILD_PID.
local script="${BATS_TEST_DIRNAME}/../../ralph_loop.sh"
# Extract only the live-mode section (from "Live output mode enabled" to "End of Output")
local live_block
live_block=$(sed -n '/Live output mode enabled/,/End of Output/p' "$script")
# set +e must appear before set -o pipefail within the live-mode block
echo "$live_block" | grep -q 'set +e'
echo "$live_block" | grep -q 'set -o pipefail'
# Pipeline must be inside a subshell (parentheses) that is backgrounded
echo "$live_block" | grep -q '^\s*('
echo "$live_block" | grep -q ') &'
# Verify ordering: set +e comes first
local plus_e_line=$(echo "$live_block" | grep -n 'set +e' | head -1 | cut -d: -f1)
local pipefail_line=$(echo "$live_block" | grep -n 'set -o pipefail' | head -1 | cut -d: -f1)
[[ -n "$plus_e_line" ]]
[[ -n "$pipefail_line" ]]
[[ $plus_e_line -lt $pipefail_line ]]
# CLAUDE_CHILD_PID must be captured for cleanup
echo "$live_block" | grep -q 'CLAUDE_CHILD_PID=\$!'
}
`@test` "live mode pipeline runs in subshell to isolate errexit (Issue `#175`)" {
# The live-mode pipeline runs in a backgrounded subshell so that
# timeout's exit code 124 never triggers set -e in the parent.
# This also enables Ctrl+C cleanup via CLAUDE_CHILD_PID.
local script="${BATS_TEST_DIRNAME}/../../ralph_loop.sh"
# Extract only the live-mode section (from "Live output mode enabled" to "End of Output")
local live_block
live_block=$(sed -n '/Live output mode enabled/,/End of Output/p' "$script")
# Pipeline must be inside a subshell (parentheses) that is backgrounded
echo "$live_block" | grep -q '^[[:space:]]*('
echo "$live_block" | grep -q ') &'
# CLAUDE_CHILD_PID must be captured for cleanup
echo "$live_block" | grep -q 'CLAUDE_CHILD_PID=\$!'
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/test_cli_modern.bats` around lines 838 - 854, In the test function
test "live mode pipeline runs in subshell to isolate errexit (Issue `#175`)"
replace GNU-only '\s' usage in the grep that checks for an indented opening
parenthesis so it becomes POSIX-portable: update the grep against the live_block
(the line echo "$live_block" | grep -q '^\s*(') to use [[:space:]] (e.g.
'^[[:space:]]*(') so macOS/BSD grep behaves correctly; leave the other checks
(') &' and 'CLAUDE_CHILD_PID=\$!') unchanged.

Comment on lines +856 to 869
@test "live mode pipeline captures exit code via wait" {
# After the backgrounded subshell, exit code is captured via wait
local script="${BATS_TEST_DIRNAME}/../../ralph_loop.sh"

# Extract only the live-mode section
local live_block
live_block=$(sed -n '/Live output mode enabled/,/End of Output/p' "$script")

# set +o pipefail and set -e must both exist in the live block
echo "$live_block" | grep -q 'set +o pipefail'
echo "$live_block" | grep -q 'set -e'

# Verify ordering: set -e comes after set +o pipefail
local pipefail_off_line=$(echo "$live_block" | grep -n 'set +o pipefail' | head -1 | cut -d: -f1)
local re_enable_line=$(echo "$live_block" | grep -n '^\s*set -e' | awk -F: -v threshold="$pipefail_off_line" '$1 > threshold {print $1; exit}')
# wait and exit_code capture must exist
echo "$live_block" | grep -q 'wait \$CLAUDE_CHILD_PID'
echo "$live_block" | grep -q 'exit_code=\$?'

[[ -n "$pipefail_off_line" ]]
[[ -n "$re_enable_line" ]]
[[ $re_enable_line -gt $pipefail_off_line ]]
# CLAUDE_CHILD_PID must be cleared after wait
echo "$live_block" | grep -q 'CLAUDE_CHILD_PID=""'
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Test validates text presence but not the || operator — would pass with the unfixed, buggy form.

The test greps for wait \$CLAUDE_CHILD_PID and exit_code=\$? as two independent patterns. Both patterns match the current (unfixed) code that has:

wait $CLAUDE_CHILD_PID
exit_code=$?

on separate lines — the very form that triggers set -e on timeout (exit 124). The test should instead assert that wait and the exit-code capture appear on the same compound command using ||, which is the only form that is immune to set -e. As written, the test provides a false sense of safety and won't catch a regression that reintroduces the bug.

🔧 Tighten the assertion to require the `||` form
-    # wait and exit_code capture must exist
-    echo "$live_block" | grep -q 'wait \$CLAUDE_CHILD_PID'
-    echo "$live_block" | grep -q 'exit_code=\$?'
+    # wait and exit_code capture must be a single compound command (|| prevents set -e
+    # from killing the script when Claude times out with exit code 124 — Issue `#175`)
+    echo "$live_block" | grep -q 'wait \$CLAUDE_CHILD_PID || exit_code=\$?'
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@test "live mode pipeline captures exit code via wait" {
# After the backgrounded subshell, exit code is captured via wait
local script="${BATS_TEST_DIRNAME}/../../ralph_loop.sh"
# Extract only the live-mode section
local live_block
live_block=$(sed -n '/Live output mode enabled/,/End of Output/p' "$script")
# set +o pipefail and set -e must both exist in the live block
echo "$live_block" | grep -q 'set +o pipefail'
echo "$live_block" | grep -q 'set -e'
# Verify ordering: set -e comes after set +o pipefail
local pipefail_off_line=$(echo "$live_block" | grep -n 'set +o pipefail' | head -1 | cut -d: -f1)
local re_enable_line=$(echo "$live_block" | grep -n '^\s*set -e' | awk -F: -v threshold="$pipefail_off_line" '$1 > threshold {print $1; exit}')
# wait and exit_code capture must exist
echo "$live_block" | grep -q 'wait \$CLAUDE_CHILD_PID'
echo "$live_block" | grep -q 'exit_code=\$?'
[[ -n "$pipefail_off_line" ]]
[[ -n "$re_enable_line" ]]
[[ $re_enable_line -gt $pipefail_off_line ]]
# CLAUDE_CHILD_PID must be cleared after wait
echo "$live_block" | grep -q 'CLAUDE_CHILD_PID=""'
}
`@test` "live mode pipeline captures exit code via wait" {
# After the backgrounded subshell, exit code is captured via wait
local script="${BATS_TEST_DIRNAME}/../../ralph_loop.sh"
local live_block
live_block=$(sed -n '/Live output mode enabled/,/End of Output/p' "$script")
# wait and exit_code capture must be a single compound command (|| prevents set -e
# from killing the script when Claude times out with exit code 124 — Issue `#175`)
echo "$live_block" | grep -q 'wait \$CLAUDE_CHILD_PID || exit_code=\$?'
# CLAUDE_CHILD_PID must be cleared after wait
echo "$live_block" | grep -q 'CLAUDE_CHILD_PID=""'
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/test_cli_modern.bats` around lines 856 - 869, The test "live mode
pipeline captures exit code via wait" currently checks for separate lines
matching wait \$CLAUDE_CHILD_PID and exit_code=\$? which allows the buggy
two-line form; change the assertions to require the compound form using the OR
operator so the wait and exit capture appear together (e.g. assert that the
live_block contains a pattern where wait \$CLAUDE_CHILD_PID and \|\| and
exit_code=\$? occur in the same command); keep the existing check that
CLAUDE_CHILD_PID is cleared (CLAUDE_CHILD_PID="") and update the grep/regex used
to match the live_block accordingly so only the safe "wait ... || exit_code=$?"
form passes.

…rl+C shutdown

- Prevent Azure Foundry credentials from leaking across concurrent tmux
  sessions by unsetting ANTHROPIC_FOUNDRY_* vars at session level when
  CLAUDE_CODE_USE_FOUNDRY != 1, and unsetting them from process env in
  the personal account code path
- Add CLAUDE_MODEL support in .ralphrc, passed as --model to Claude CLI
- Fix Ctrl+C in --live mode: track child PIDs and kill the entire process
  tree in cleanup() to prevent orphaned Claude processes
- Add forward_tmux_env() for configurable env var forwarding into tmux
- Add wait_for_circuit_recovery() so OPEN circuit waits for cooldown
  instead of halting the loop permanently
- Write call count at execution start for accurate monitor display

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@tudoanh tudoanh force-pushed the fix/foundry-isolation-model-override-ctrlc branch from c60b5a9 to a0cb953 Compare February 18, 2026 16:45
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/unit/test_cli_modern.bats (1)

893-902: ⚠️ Potential issue | 🟠 Major

Add unit tests for new functions forward_tmux_env and wait_for_circuit_recovery.

These functions introduced in ralph_loop.sh lack unit tests across the entire test suite. forward_tmux_env (line 206) handles prefix-pattern matching, exact environment variable matching, and conditional Foundry variable unsetting. wait_for_circuit_recovery (line 515) implements cooldown monitoring with state-transition logic. Both are testable with bats. Per coding guidelines: "Add unit tests for all new bash script functions using bats test framework."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/test_cli_modern.bats` around lines 893 - 902, Add bats unit tests
for the two new bash functions: for forward_tmux_env, write tests that cover
prefix-pattern matching (variables like FOO_BAR matching FOO_), exact name
matching, and the conditional unsetting behavior for Foundry-related vars
(ensure variables are forwarded or removed as per the function logic in
forward_tmux_env); for wait_for_circuit_recovery, add tests that simulate and
assert cooldown monitoring and state-transition logic (e.g., transitions from
open to closed with cooldowns, proper sleep/backoff behavior, and timeout
handling) ensuring both normal and edge cases are covered; implement these as
bats test cases in the existing test suite following its conventions so they run
with the test runner.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@tests/unit/test_cli_modern.bats`:
- Around line 893-902: Add bats unit tests for the two new bash functions: for
forward_tmux_env, write tests that cover prefix-pattern matching (variables like
FOO_BAR matching FOO_), exact name matching, and the conditional unsetting
behavior for Foundry-related vars (ensure variables are forwarded or removed as
per the function logic in forward_tmux_env); for wait_for_circuit_recovery, add
tests that simulate and assert cooldown monitoring and state-transition logic
(e.g., transitions from open to closed with cooldowns, proper sleep/backoff
behavior, and timeout handling) ensuring both normal and edge cases are covered;
implement these as bats test cases in the existing test suite following its
conventions so they run with the test runner.

---

Duplicate comments:
In `@ralph_loop.sh`:
- Around line 514-548: The wait_for_circuit_recovery function can loop forever
if get_circuit_state keeps returning OPEN; add a safety upper bound by computing
a max_wait_seconds (e.g., (CB_COOLDOWN_MINUTES + 10) * 60) and track
total_sleeped or start_time to break the loop and return non-zero after
exceeding the max; keep calling init_circuit_breaker and get_circuit_state as
now, but when timing out log an ERROR and update_status accordingly before
returning 1; also add a full shell-style comment block above
wait_for_circuit_recovery documenting purpose, parameters (loop_count), and
return values (0 on recovered, non-zero on timeout/error); use existing symbols
like get_circuit_state, init_circuit_breaker, CB_COOLDOWN_MINUTES,
CB_STATE_FILE, parse_iso_to_epoch to locate and implement the changes.
- Around line 203-245: In forward_tmux_env, ensure shell globbing (set -f) is
always restored even if a command exits early: either wrap the for/while loop
that runs under set -f in a subshell (so set -f is local to that subshell) or
make the tmux set-environment invocations resilient by appending || true to each
tmux set-environment call (both in the prefix loop and the exact-match branch)
so an error won’t cause premature exit before the set +f; reference set -f, set
+f, tmux set-environment, and the forward_tmux_env function when making the
change.
- Around line 1464-1466: When waiting for the background Claude process, avoid
allowing set -e to abort the script on nonzero wait status: replace the
unguarded "wait $claude_pid" with a guarded form like "wait $claude_pid || true"
so the script can then reliably capture the exit code into "exit_code=$?" and
still clear "CLAUDE_CHILD_PID=\"\""; update the block that references claude_pid
and CLAUDE_CHILD_PID accordingly.
- Around line 1212-1249: The log currently prints the first 10 characters of
ANTHROPIC_FOUNDRY_API_KEY (via the log_status call), which risks leaking
secrets; change the log in the Foundry branch to avoid revealing key material —
instead log presence only (e.g., "API Key: set") or a masked form showing only
the last 4 characters (e.g., "API Key: ****1234"); update the log_status
invocation that references ANTHROPIC_FOUNDRY_API_KEY so it does not include the
first-10-chars expansion and ensure any masking logic uses only read-only
reporting (no key reconstruction) and is applied wherever
ANTHROPIC_FOUNDRY_API_KEY is logged.
- Around line 1337-1345: The wait on CLAUDE_CHILD_PID must be protected from set
-e so a non‑zero child exit (e.g. timeout 124) doesn't abort before capturing
exit_code; modify the block around wait/exit_code to temporarily disable errexit
(e.g. set +e; wait "$CLAUDE_CHILD_PID"; exit_code=$?; set -e) or use wait
"$CLAUDE_CHILD_PID" || true then capture exit_code=$? immediately after, and
ensure CLAUDE_CHILD_PID is referenced/quoted consistently; keep the subshell
pipeline unchanged so PIPESTATUS inside it still reports per-command failures.

In `@tests/unit/test_cli_modern.bats`:
- Around line 838-854: The test uses '\s' in the grep regex which is
non‑portable; update the grep that checks for the opening subshell to use a
POSIX portable character class, e.g. change the pattern "^\s*(" to
"^[[:space:]]*(" (keep the other greps unchanged), referencing the live_block
variable and the grep calls that currently assert the subshell start and
background pattern.
- Around line 856-869: The test currently checks for 'wait $CLAUDE_CHILD_PID'
and 'exit_code=$?' as separate patterns which lets the unsafe two-line form
pass; update the assertions to require the safe combined form by matching the
literal sequence "wait $CLAUDE_CHILD_PID || exit_code=$?" (or otherwise verify
those tokens appear on the same line) and keep the existing check for clearing
CLAUDE_CHILD_PID (CLAUDE_CHILD_PID="") so the test ensures the wait is safe
under set -e and that the pid is cleared afterward.

@tudoanh tudoanh closed this Feb 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: set -e causes script termination on API limit prompt timeout (read -t 30)

1 participant