Skip to content

fix(loop): harden cleanup handling for signal, error, and crash exit paths#205

Closed
timothy-20 wants to merge 2 commits intofrankbria:mainfrom
timothy-20:fix/loop-cleanup-handling
Closed

fix(loop): harden cleanup handling for signal, error, and crash exit paths#205
timothy-20 wants to merge 2 commits intofrankbria:mainfrom
timothy-20:fix/loop-cleanup-handling

Conversation

@timothy-20
Copy link

@timothy-20 timothy-20 commented Feb 27, 2026

Closed: Superseded by #208 (refactor/remove-set-e), which removes set -e entirely and includes all cleanup improvements from this PR (trap_exit_code capture, reentrancy guard, _CLAUDE_PID tracking, stale exit signals reset, stream_output_file cleanup).

…paths

- Kill orphaned Claude child process on Ctrl+C via global _CLAUDE_PID tracking
- Switch from trap SIGINT/SIGTERM to trap EXIT to cover set -e and normal exits
- Add reentrancy guard (_CLEANUP_DONE) to prevent double cleanup execution
- Reset stale .exit_signals and .response_analysis on startup (SIGKILL/OOM recovery)
- Clean up empty stderr files and temporary stream logs to prevent accumulation
- Fix test_session_continuity grep range to verify reset_session in expanded cleanup

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 27, 2026

Walkthrough

Introduces Claude process lifecycle tracking and reentrancy-safe cleanup in ralph_loop.sh: adds _CLAUDE_PID and _CLEANUP_DONE, replaces SIGINT/SIGTERM traps with a single EXIT trap that passes the exit code, resets prior exit/response state at startup, and removes temporary stream logs; test grep context widened.

Changes

Cohort / File(s) Summary
Core script — process & cleanup
ralph_loop.sh
Added globals _CLAUDE_PID and _CLEANUP_DONE; switch from trap cleanup SIGINT SIGTERM to trap 'cleanup $?' EXIT; cleanup() accepts optional exit code, is reentrancy-guarded, kills/waits _CLAUDE_PID if set, conditionally records interrupted status, and no longer exits directly.
Startup/state reset
ralph_loop.sh
Reset/initialize EXIT_SIGNALS_FILE at startup and remove stale RESPONSE_ANALYSIS_FILE/state to ensure clean runs between invocations.
Claude invocation & output handling
ralph_loop.sh
In background mode set/clear _CLAUDE_PID around the Claude process; in live/background output paths remove temporary stream log files after extracting results.
Tests
tests/unit/test_session_continuity.bats
Widened grep context for the "cleanup function includes session reset" assertion from 5 to 20 lines to accommodate expanded cleanup logic.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Script as Ralph Loop (script)
participant Claude as Claude Process
participant OS as OS/Signals
participant FS as Temp Stream Log (file)
rect rgba(200,200,255,0.5)
Script->>FS: initialize/clear EXIT_SIGNALS_FILE & RESPONSE_ANALYSIS_FILE
end
rect rgba(200,255,200,0.5)
Script->>Claude: spawn (background) -> record _CLAUDE_PID
Claude-->>Script: runs (asynchronous)
Script->>FS: write stream output
end
rect rgba(255,200,200,0.5)
OS-->>Script: EXIT trap triggered (exit code passed)
Script->>Script: cleanup(trap_exit_code) checks _CLEANUP_DONE
alt _CLAUDE_PID set
Script->>Claude: kill/wait _CLAUDE_PID
Script->>Script: clear _CLAUDE_PID
end
Script->>FS: remove temp stream log
Script-->>OS: update status (interrupted if non-zero)
Note right of Script: cleanup returns (no direct exit)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

🐰
I hop and log with tidy paws,
I track the Claude and mind the claws,
When EXIT rings I guard the gate,
No orphans, temp files abate,
A clean loop — hop! — all processes safe.

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly matches the main objective: hardening cleanup handling for signal, error, and crash exit paths, which is the primary purpose of this changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
ralph_loop.sh (1)

1317-1349: ⚠️ Potential issue | 🟠 Major

Clear _CLAUDE_PID before early return in the early failure detection path.

The early-exit handler starting at line 1330 returns without clearing _CLAUDE_PID that was set at line 1319. When the script later exits, cleanup() will attempt to kill this stale PID. If the OS recycles that PID for another process in the meantime, the wrong process gets killed.

🔧 Proposed fix
        if ! kill -0 $claude_pid 2>/dev/null; then
            wait $claude_pid 2>/dev/null
            local early_exit=$?
+           _CLAUDE_PID=""
            local early_output=""
            if [[ -f "$output_file" && -s "$output_file" ]]; then
                early_output=$(tail -5 "$output_file" 2>/dev/null)
            fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ralph_loop.sh` around lines 1317 - 1349, The early-exit branch that detects a
failing background Claude process sets _CLAUDE_PID at the top (local
claude_pid=$!; _CLAUDE_PID=$claude_pid) but returns without clearing it, which
can cause cleanup() to kill an unrelated PID later; to fix, unset or clear
_CLAUDE_PID (e.g., _CLAUDE_PID="") immediately before the early return in the
early failure detection block (the branch that logs "Claude Code process exited
immediately" and returns 1) so cleanup() won't try to kill a stale PID.
🧹 Nitpick comments (1)
ralph_loop.sh (1)

1623-1625: Use RESPONSE_ANALYSIS_FILE instead of a hardcoded file path.

Line 1625 duplicates the response-analysis path literal. Reusing RESPONSE_ANALYSIS_FILE keeps path changes centralized.

♻️ Proposed refactor
-    rm -f "$RALPH_DIR/.response_analysis"
+    rm -f "$RESPONSE_ANALYSIS_FILE"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ralph_loop.sh` around lines 1623 - 1625, Replace the hardcoded
response-analysis path with the existing RESPONSE_ANALYSIS_FILE variable:
instead of removing "$RALPH_DIR/.response_analysis" in the reset block that also
writes to "$EXIT_SIGNALS_FILE", call rm -f "$RESPONSE_ANALYSIS_FILE"; ensure
RESPONSE_ANALYSIS_FILE is defined earlier in the script so all uses share the
same centralized path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ralph_loop.sh`:
- Around line 1536-1546: The cleanup function currently always treats any exit
with loop_count>0 as an interruption; modify cleanup (the function referenced by
trap cleanup EXIT) to capture the exit code at the start (e.g., rc=$?) and only
perform the "interrupted" path (calling reset_session "manual_interrupt" and
update_status ...) when loop_count>0 AND rc != 0; otherwise exit silently so
previously written graceful statuses (written elsewhere by the script) are not
overwritten. Ensure you reference the same CALL_COUNT_FILE when reading counts
and keep trap cleanup EXIT intact.

---

Outside diff comments:
In `@ralph_loop.sh`:
- Around line 1317-1349: The early-exit branch that detects a failing background
Claude process sets _CLAUDE_PID at the top (local claude_pid=$!;
_CLAUDE_PID=$claude_pid) but returns without clearing it, which can cause
cleanup() to kill an unrelated PID later; to fix, unset or clear _CLAUDE_PID
(e.g., _CLAUDE_PID="") immediately before the early return in the early failure
detection block (the branch that logs "Claude Code process exited immediately"
and returns 1) so cleanup() won't try to kill a stale PID.

---

Nitpick comments:
In `@ralph_loop.sh`:
- Around line 1623-1625: Replace the hardcoded response-analysis path with the
existing RESPONSE_ANALYSIS_FILE variable: instead of removing
"$RALPH_DIR/.response_analysis" in the reset block that also writes to
"$EXIT_SIGNALS_FILE", call rm -f "$RESPONSE_ANALYSIS_FILE"; ensure
RESPONSE_ANALYSIS_FILE is defined earlier in the script so all uses share the
same centralized path.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6ff27b4 and 5a73510.

📒 Files selected for processing (2)
  • ralph_loop.sh
  • tests/unit/test_session_continuity.bats

cleanup() now receives the exit code via 'trap cleanup $? EXIT' and
only writes 'interrupted/stopped' status for non-zero exits. This
prevents normal exits from having their status overwritten.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
ralph_loop.sh (1)

1625-1627: Use RESPONSE_ANALYSIS_FILE constant for consistency.

Line 1627 hardcodes "$RALPH_DIR/.response_analysis" instead of using "$RESPONSE_ANALYSIS_FILE". Reusing the constant avoids drift if path definitions change.

♻️ Suggested consistency patch
-    rm -f "$RALPH_DIR/.response_analysis"
+    rm -f "$RESPONSE_ANALYSIS_FILE"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ralph_loop.sh` around lines 1625 - 1627, The script resets exit signals but
hardcodes the response analysis path; replace the literal
"$RALPH_DIR/.response_analysis" with the existing RESPONSE_ANALYSIS_FILE
constant to keep path usage consistent: update the rm -f invocation that follows
the initialization of EXIT_SIGNALS_FILE so it uses "$RESPONSE_ANALYSIS_FILE"
(ensure RESPONSE_ANALYSIS_FILE is defined earlier and exported as needed).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@ralph_loop.sh`:
- Around line 1625-1627: The script resets exit signals but hardcodes the
response analysis path; replace the literal "$RALPH_DIR/.response_analysis" with
the existing RESPONSE_ANALYSIS_FILE constant to keep path usage consistent:
update the rm -f invocation that follows the initialization of EXIT_SIGNALS_FILE
so it uses "$RESPONSE_ANALYSIS_FILE" (ensure RESPONSE_ANALYSIS_FILE is defined
earlier and exported as needed).

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5a73510 and 0f3dbcb.

📒 Files selected for processing (1)
  • ralph_loop.sh

@timothy-20 timothy-20 closed this Feb 28, 2026
@timothy-20 timothy-20 deleted the fix/loop-cleanup-handling branch February 28, 2026 07:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant