fix(loop): harden cleanup handling for signal, error, and crash exit paths#205
fix(loop): harden cleanup handling for signal, error, and crash exit paths#205timothy-20 wants to merge 2 commits intofrankbria:mainfrom
Conversation
…paths - Kill orphaned Claude child process on Ctrl+C via global _CLAUDE_PID tracking - Switch from trap SIGINT/SIGTERM to trap EXIT to cover set -e and normal exits - Add reentrancy guard (_CLEANUP_DONE) to prevent double cleanup execution - Reset stale .exit_signals and .response_analysis on startup (SIGKILL/OOM recovery) - Clean up empty stderr files and temporary stream logs to prevent accumulation - Fix test_session_continuity grep range to verify reset_session in expanded cleanup Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
WalkthroughIntroduces Claude process lifecycle tracking and reentrancy-safe cleanup in Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
ralph_loop.sh (1)
1317-1349:⚠️ Potential issue | 🟠 MajorClear
_CLAUDE_PIDbefore early return in the early failure detection path.The early-exit handler starting at line 1330 returns without clearing
_CLAUDE_PIDthat was set at line 1319. When the script later exits,cleanup()will attempt to kill this stale PID. If the OS recycles that PID for another process in the meantime, the wrong process gets killed.🔧 Proposed fix
if ! kill -0 $claude_pid 2>/dev/null; then wait $claude_pid 2>/dev/null local early_exit=$? + _CLAUDE_PID="" local early_output="" if [[ -f "$output_file" && -s "$output_file" ]]; then early_output=$(tail -5 "$output_file" 2>/dev/null) fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ralph_loop.sh` around lines 1317 - 1349, The early-exit branch that detects a failing background Claude process sets _CLAUDE_PID at the top (local claude_pid=$!; _CLAUDE_PID=$claude_pid) but returns without clearing it, which can cause cleanup() to kill an unrelated PID later; to fix, unset or clear _CLAUDE_PID (e.g., _CLAUDE_PID="") immediately before the early return in the early failure detection block (the branch that logs "Claude Code process exited immediately" and returns 1) so cleanup() won't try to kill a stale PID.
🧹 Nitpick comments (1)
ralph_loop.sh (1)
1623-1625: UseRESPONSE_ANALYSIS_FILEinstead of a hardcoded file path.Line 1625 duplicates the response-analysis path literal. Reusing
RESPONSE_ANALYSIS_FILEkeeps path changes centralized.♻️ Proposed refactor
- rm -f "$RALPH_DIR/.response_analysis" + rm -f "$RESPONSE_ANALYSIS_FILE"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ralph_loop.sh` around lines 1623 - 1625, Replace the hardcoded response-analysis path with the existing RESPONSE_ANALYSIS_FILE variable: instead of removing "$RALPH_DIR/.response_analysis" in the reset block that also writes to "$EXIT_SIGNALS_FILE", call rm -f "$RESPONSE_ANALYSIS_FILE"; ensure RESPONSE_ANALYSIS_FILE is defined earlier in the script so all uses share the same centralized path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ralph_loop.sh`:
- Around line 1536-1546: The cleanup function currently always treats any exit
with loop_count>0 as an interruption; modify cleanup (the function referenced by
trap cleanup EXIT) to capture the exit code at the start (e.g., rc=$?) and only
perform the "interrupted" path (calling reset_session "manual_interrupt" and
update_status ...) when loop_count>0 AND rc != 0; otherwise exit silently so
previously written graceful statuses (written elsewhere by the script) are not
overwritten. Ensure you reference the same CALL_COUNT_FILE when reading counts
and keep trap cleanup EXIT intact.
---
Outside diff comments:
In `@ralph_loop.sh`:
- Around line 1317-1349: The early-exit branch that detects a failing background
Claude process sets _CLAUDE_PID at the top (local claude_pid=$!;
_CLAUDE_PID=$claude_pid) but returns without clearing it, which can cause
cleanup() to kill an unrelated PID later; to fix, unset or clear _CLAUDE_PID
(e.g., _CLAUDE_PID="") immediately before the early return in the early failure
detection block (the branch that logs "Claude Code process exited immediately"
and returns 1) so cleanup() won't try to kill a stale PID.
---
Nitpick comments:
In `@ralph_loop.sh`:
- Around line 1623-1625: Replace the hardcoded response-analysis path with the
existing RESPONSE_ANALYSIS_FILE variable: instead of removing
"$RALPH_DIR/.response_analysis" in the reset block that also writes to
"$EXIT_SIGNALS_FILE", call rm -f "$RESPONSE_ANALYSIS_FILE"; ensure
RESPONSE_ANALYSIS_FILE is defined earlier in the script so all uses share the
same centralized path.
cleanup() now receives the exit code via 'trap cleanup $? EXIT' and only writes 'interrupted/stopped' status for non-zero exits. This prevents normal exits from having their status overwritten. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
ralph_loop.sh (1)
1625-1627: UseRESPONSE_ANALYSIS_FILEconstant for consistency.Line 1627 hardcodes
"$RALPH_DIR/.response_analysis"instead of using"$RESPONSE_ANALYSIS_FILE". Reusing the constant avoids drift if path definitions change.♻️ Suggested consistency patch
- rm -f "$RALPH_DIR/.response_analysis" + rm -f "$RESPONSE_ANALYSIS_FILE"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ralph_loop.sh` around lines 1625 - 1627, The script resets exit signals but hardcodes the response analysis path; replace the literal "$RALPH_DIR/.response_analysis" with the existing RESPONSE_ANALYSIS_FILE constant to keep path usage consistent: update the rm -f invocation that follows the initialization of EXIT_SIGNALS_FILE so it uses "$RESPONSE_ANALYSIS_FILE" (ensure RESPONSE_ANALYSIS_FILE is defined earlier and exported as needed).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@ralph_loop.sh`:
- Around line 1625-1627: The script resets exit signals but hardcodes the
response analysis path; replace the literal "$RALPH_DIR/.response_analysis" with
the existing RESPONSE_ANALYSIS_FILE constant to keep path usage consistent:
update the rm -f invocation that follows the initialization of EXIT_SIGNALS_FILE
so it uses "$RESPONSE_ANALYSIS_FILE" (ensure RESPONSE_ANALYSIS_FILE is defined
earlier and exported as needed).
Closed: Superseded by #208 (
refactor/remove-set-e), which removesset -eentirely and includes all cleanup improvements from this PR (trap_exit_code capture, reentrancy guard, _CLAUDE_PID tracking, stale exit signals reset, stream_output_file cleanup).