fix(loop): harden cleanup handling for signal, error, and crash exit paths by timothy-20 · Pull Request #205 · frankbria/ralph-claude-code

timothy-20 · 2026-02-27T10:36:54Z

Closed: Superseded by #208 (refactor/remove-set-e), which removes set -e entirely and includes all cleanup improvements from this PR (trap_exit_code capture, reentrancy guard, _CLAUDE_PID tracking, stale exit signals reset, stream_output_file cleanup).

…paths - Kill orphaned Claude child process on Ctrl+C via global _CLAUDE_PID tracking - Switch from trap SIGINT/SIGTERM to trap EXIT to cover set -e and normal exits - Add reentrancy guard (_CLEANUP_DONE) to prevent double cleanup execution - Reset stale .exit_signals and .response_analysis on startup (SIGKILL/OOM recovery) - Clean up empty stderr files and temporary stream logs to prevent accumulation - Fix test_session_continuity grep range to verify reset_session in expanded cleanup Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai · 2026-02-27T10:37:11Z

Walkthrough

Introduces Claude process lifecycle tracking and reentrancy-safe cleanup in ralph_loop.sh: adds _CLAUDE_PID and _CLEANUP_DONE, replaces SIGINT/SIGTERM traps with a single EXIT trap that passes the exit code, resets prior exit/response state at startup, and removes temporary stream logs; test grep context widened.

Changes

Cohort / File(s)	Summary
Core script — process & cleanup `ralph_loop.sh`	Added globals `_CLAUDE_PID` and `_CLEANUP_DONE`; switch from `trap cleanup SIGINT SIGTERM` to `trap 'cleanup $?' EXIT`; `cleanup()` accepts optional exit code, is reentrancy-guarded, kills/waits `_CLAUDE_PID` if set, conditionally records interrupted status, and no longer exits directly.
Startup/state reset `ralph_loop.sh`	Reset/initialize `EXIT_SIGNALS_FILE` at startup and remove stale `RESPONSE_ANALYSIS_FILE`/state to ensure clean runs between invocations.
Claude invocation & output handling `ralph_loop.sh`	In background mode set/clear `_CLAUDE_PID` around the Claude process; in live/background output paths remove temporary stream log files after extracting results.
Tests `tests/unit/test_session_continuity.bats`	Widened grep context for the "cleanup function includes session reset" assertion from 5 to 20 lines to accommodate expanded cleanup logic.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant Script as Ralph Loop (script)
participant Claude as Claude Process
participant OS as OS/Signals
participant FS as Temp Stream Log (file)
rect rgba(200,200,255,0.5)
Script->>FS: initialize/clear EXIT_SIGNALS_FILE & RESPONSE_ANALYSIS_FILE
end
rect rgba(200,255,200,0.5)
Script->>Claude: spawn (background) -> record _CLAUDE_PID
Claude-->>Script: runs (asynchronous)
Script->>FS: write stream output
end
rect rgba(255,200,200,0.5)
OS-->>Script: EXIT trap triggered (exit code passed)
Script->>Script: cleanup(trap_exit_code) checks _CLEANUP_DONE
alt _CLAUDE_PID set
Script->>Claude: kill/wait _CLAUDE_PID
Script->>Script: clear _CLAUDE_PID
end
Script->>FS: remove temp stream log
Script-->>OS: update status (interrupted if non-zero)
Note right of Script: cleanup returns (no direct exit)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

fix: Recognize STATUS: COMPLETE as progress in circuit breaker #140 — touches RESPONSE_ANALYSIS_FILE creation/clearing; directly related to this PR's startup state reset.
feat(session): add session lifecycle management with auto-reset triggers #66 — modifies ralph_loop.sh trap/cleanup behavior and session reset; strongly related to cleanup/trap changes here.

Poem

🐰
I hop and log with tidy paws,
I track the Claude and mind the claws,
When EXIT rings I guard the gate,
No orphans, temp files abate,
A clean loop — hop! — all processes safe.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly matches the main objective: hardening cleanup handling for signal, error, and crash exit paths, which is the primary purpose of this changeset.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

ralph_loop.sh (1)
1317-1349: ⚠️ Potential issue | 🟠 Major

Clear _CLAUDE_PID before early return in the early failure detection path.

The early-exit handler starting at line 1330 returns without clearing _CLAUDE_PID that was set at line 1319. When the script later exits, cleanup() will attempt to kill this stale PID. If the OS recycles that PID for another process in the meantime, the wrong process gets killed.
🔧 Proposed fix
        if ! kill -0 $claude_pid 2>/dev/null; then
            wait $claude_pid 2>/dev/null
            local early_exit=$?
+           _CLAUDE_PID=""
            local early_output=""
            if [[ -f "$output_file" && -s "$output_file" ]]; then
                early_output=$(tail -5 "$output_file" 2>/dev/null)
            fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ralph_loop.sh` around lines 1317 - 1349, The early-exit branch that detects a
failing background Claude process sets _CLAUDE_PID at the top (local
claude_pid=$!; _CLAUDE_PID=$claude_pid) but returns without clearing it, which
can cause cleanup() to kill an unrelated PID later; to fix, unset or clear
_CLAUDE_PID (e.g., _CLAUDE_PID="") immediately before the early return in the
early failure detection block (the branch that logs "Claude Code process exited
immediately" and returns 1) so cleanup() won't try to kill a stale PID.

🧹 Nitpick comments (1)

ralph_loop.sh (1)
1623-1625: Use RESPONSE_ANALYSIS_FILE instead of a hardcoded file path.

Line 1625 duplicates the response-analysis path literal. Reusing RESPONSE_ANALYSIS_FILE keeps path changes centralized.
♻️ Proposed refactor
-    rm -f "$RALPH_DIR/.response_analysis"
+    rm -f "$RESPONSE_ANALYSIS_FILE"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ralph_loop.sh` around lines 1623 - 1625, Replace the hardcoded
response-analysis path with the existing RESPONSE_ANALYSIS_FILE variable:
instead of removing "$RALPH_DIR/.response_analysis" in the reset block that also
writes to "$EXIT_SIGNALS_FILE", call rm -f "$RESPONSE_ANALYSIS_FILE"; ensure
RESPONSE_ANALYSIS_FILE is defined earlier in the script so all uses share the
same centralized path.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ralph_loop.sh`:
- Around line 1536-1546: The cleanup function currently always treats any exit
with loop_count>0 as an interruption; modify cleanup (the function referenced by
trap cleanup EXIT) to capture the exit code at the start (e.g., rc=$?) and only
perform the "interrupted" path (calling reset_session "manual_interrupt" and
update_status ...) when loop_count>0 AND rc != 0; otherwise exit silently so
previously written graceful statuses (written elsewhere by the script) are not
overwritten. Ensure you reference the same CALL_COUNT_FILE when reading counts
and keep trap cleanup EXIT intact.

---

Outside diff comments:
In `@ralph_loop.sh`:
- Around line 1317-1349: The early-exit branch that detects a failing background
Claude process sets _CLAUDE_PID at the top (local claude_pid=$!;
_CLAUDE_PID=$claude_pid) but returns without clearing it, which can cause
cleanup() to kill an unrelated PID later; to fix, unset or clear _CLAUDE_PID
(e.g., _CLAUDE_PID="") immediately before the early return in the early failure
detection block (the branch that logs "Claude Code process exited immediately"
and returns 1) so cleanup() won't try to kill a stale PID.

---

Nitpick comments:
In `@ralph_loop.sh`:
- Around line 1623-1625: Replace the hardcoded response-analysis path with the
existing RESPONSE_ANALYSIS_FILE variable: instead of removing
"$RALPH_DIR/.response_analysis" in the reset block that also writes to
"$EXIT_SIGNALS_FILE", call rm -f "$RESPONSE_ANALYSIS_FILE"; ensure
RESPONSE_ANALYSIS_FILE is defined earlier in the script so all uses share the
same centralized path.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6ff27b4 and 5a73510.

📒 Files selected for processing (2)

ralph_loop.sh
tests/unit/test_session_continuity.bats

ralph_loop.sh

cleanup() now receives the exit code via 'trap cleanup $? EXIT' and only writes 'interrupted/stopped' status for non-zero exits. This prevents normal exits from having their status overwritten. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

🧹 Nitpick comments (1)

ralph_loop.sh (1)
1625-1627: Use RESPONSE_ANALYSIS_FILE constant for consistency.

Line 1627 hardcodes "$RALPH_DIR/.response_analysis" instead of using "$RESPONSE_ANALYSIS_FILE". Reusing the constant avoids drift if path definitions change.
♻️ Suggested consistency patch
-    rm -f "$RALPH_DIR/.response_analysis"
+    rm -f "$RESPONSE_ANALYSIS_FILE"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ralph_loop.sh` around lines 1625 - 1627, The script resets exit signals but
hardcodes the response analysis path; replace the literal
"$RALPH_DIR/.response_analysis" with the existing RESPONSE_ANALYSIS_FILE
constant to keep path usage consistent: update the rm -f invocation that follows
the initialization of EXIT_SIGNALS_FILE so it uses "$RESPONSE_ANALYSIS_FILE"
(ensure RESPONSE_ANALYSIS_FILE is defined earlier and exported as needed).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@ralph_loop.sh`:
- Around line 1625-1627: The script resets exit signals but hardcodes the
response analysis path; replace the literal "$RALPH_DIR/.response_analysis" with
the existing RESPONSE_ANALYSIS_FILE constant to keep path usage consistent:
update the rm -f invocation that follows the initialization of EXIT_SIGNALS_FILE
so it uses "$RESPONSE_ANALYSIS_FILE" (ensure RESPONSE_ANALYSIS_FILE is defined
earlier and exported as needed).

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5a73510 and 0f3dbcb.

📒 Files selected for processing (1)

ralph_loop.sh

coderabbitai bot reviewed Feb 27, 2026

View reviewed changes

ralph_loop.sh Outdated Show resolved Hide resolved

coderabbitai bot reviewed Feb 27, 2026

View reviewed changes

timothy-20 closed this Feb 28, 2026

timothy-20 deleted the fix/loop-cleanup-handling branch February 28, 2026 07:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(loop): harden cleanup handling for signal, error, and crash exit paths#205

fix(loop): harden cleanup handling for signal, error, and crash exit paths#205
timothy-20 wants to merge 2 commits intofrankbria:mainfrom
timothy-20:fix/loop-cleanup-handling

timothy-20 commented Feb 27, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Feb 27, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

timothy-20 commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

timothy-20 commented Feb 27, 2026 •

edited

Loading

coderabbitai bot commented Feb 27, 2026 •

edited

Loading