refactor(loop): remove set -e in favor of explicit error handling by timothy-20 · Pull Request #208 · frankbria/ralph-claude-code

timothy-20 · 2026-02-28T06:01:43Z

Summary

Remove set -e from ralph_loop.sh and replace with explicit error handling
Per BashFAQ/105: "don't use set -e. Add your own error checking instead." — set -e is designed for linear configure && make && install scripts, not complex long-running loops with functions, conditionals, and pipelines

Changes

File	What changed
`ralph_loop.sh`	Remove `set -e` (L6); add `\|\| { echo "FATAL: ..."; exit 1; }` guards to 5 `source` statements; remove `set +e`/`set -e`/`set -o pipefail`/`set +o pipefail` toggle block in live mode pipeline; separate stderr to dedicated file to prevent jq pipeline corruption (Issue #190); improve `cleanup()` with `trap_exit_code` capture and reentrancy guard; add conditional analysis failure handling with stale file cleanup; change 3 `grep -c \|\| true` → `grep -c \|\| echo "0"`; remove `\|\| true` from `reset_session` and integrity check
`tests/unit/test_cli_modern.bats`	Replace 3 errexit structural tests with 1 no-toggle verification; add 7 new tests (no set-e, source guards, cleanup trap, analysis failure, stderr separation)
`CLAUDE.md`	Update test counts (568→573), update test_cli_modern.bats description

Why

set -e in ralph_loop.sh has accumulated significant defensive overhead:

|| true guards: Added to prevent non-zero returns from killing the script (e.g., log_session_transition || true)
set +e/set -e toggle: Required around the live-mode pipeline because portable_timeout returns exit code 124 on timeout, which set -e interprets as a fatal error (Issue set -e + set -o pipefail causes silent script death on Claude timeout #175)
Structural tests: Tests existed solely to verify set -e workarounds were in place
((expr)) risk: Arithmetic expressions return exit code 1 when the value is 0, a subtle set -e trap

Removing set -e eliminates all of this while maintaining safety through:

Source guards: The only place where "fail fast" truly matters — if a library can't be loaded, the script cannot function
Existing error handling: exit_code capture, circuit breaker thresholds, and explicit conditionals already handle all runtime errors
New explicit handling: analyze_response failure now skips signal updates and removes stale files; cleanup() distinguishes normal vs abnormal exits

Intentionally kept `|| true` (3 instances)

These serve functional purposes unrelated to set -e:

kill "$_CLAUDE_PID" 2>/dev/null || true — process may already be terminated
wait "$_CLAUDE_PID" 2>/dev/null || true — same
read -t 30 -n 1 user_choice || true — timeout is expected behavior

Test plan

npm test — 573 tests, all pass except 5 pre-existing upstream failures (circuit breaker cooldown timezone issue — addressed separately in PR fix(tests): use UTC in get_past_timestamp to fix cooldown tests on non-UTC systems #204)
grep -n 'set -e' ralph_loop.sh — no residual set -e
Verified source guards pattern: || { echo "FATAL: ..."; exit 1; }
Branch is based on upstream/main — clean diff with no unrelated changes

🤖 Generated with Claude Code

coderabbitai · 2026-02-28T06:01:58Z

Walkthrough

This PR enhances error handling and logging robustness in ralph_loop.sh by adding explicit error guards for source statements, dedicated stderr capture for Claude CLI invocations, conditional exit signal updates after successful analysis, and fallback defaults for brittle operations. Documentation and test coverage are updated accordingly to reflect the new behavior.

Changes

Cohort / File(s)	Summary
Documentation Updates `CLAUDE.md`	Updates test count from 568 to 573 tests to reflect new test additions in `test_cli_modern.bats`.
Error Handling & Stderr Capture `ralph_loop.sh`	Adds explicit error handling for sourcing library scripts, introduces dedicated stderr capture for Claude CLI runs, stores stderr separately to prevent pipeline contamination, and implements cleanup logic in trap flow with reentrancy guards.
Exit Signal & Analysis Handling `ralph_loop.sh`	Conditionally updates exit signals only after successful response analysis; logs warnings and skips updates on analysis failure. Replaces brittle fallbacks with explicit zero defaults for fix_plan.md item counts, ensuring deterministic totals computation.
Test Enhancements for New Behaviors `tests/unit/test_cli_modern.bats`	Updates live-mode tests to verify stderr redirection pattern (`2>"$stderr_file"`), removes errexit toggle assertions, and adds new tests for: explicit error guards in sourced libs, set -e removal semantics, cleanup behavior on exit, analysis failure handling, and separate stderr file logging without stdout merging.
Command Building & CLI Guards `tests/unit/test_cli_modern.bats`	Adds comprehensive tests for modern CLI behavior including: stdin redirection from /dev/null, prompt-file to -p flag transition, multiline prompt handling, shell-injection safety, CLI command guards, and format override precedence. Validates `build_claude_command` uses -p flag and `validate_claude_command` is invoked before main loop.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

fix(loop): prevent silent script death on Claude timeout #179: Both PRs modify ralph_loop.sh's live-stream pipeline error handling and stderr capture behavior, including errexit guards and logging/timeout management.
fix: redirect stdin from /dev/null for all claude execution paths #170: Both PRs modify Claude CLI invocation to redirect stdin from /dev/null for live/streaming and modern-background execution paths, with corresponding test coverage updates.
feat(exit): detect permission denials and halt loop (Issue #101) #142: Both PRs modify ralph_loop.sh's exit and analysis handling, including conditional exit-signal updates and failure-case logic in the main loop.

Poem

🐰 With whiskers twitching, I trace each care,
Error guards placed with rabbit's flair,
Stderr captured, cleanly fed,
Exit signals properly spread,
Robust loops, now standing fair! 🌟

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'refactor(loop): remove set -e in favor of explicit error handling' directly and clearly summarizes the main objective of the PR—removing set -e and replacing it with explicit error handling.
Docstring Coverage	✅ Passed	Docstring coverage is 84.21% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ralph_loop.sh`:
- Around line 1603-1613: The cleanup function and EXIT trap currently
unconditionally call update_status("interrupted","stopped") and overwrite
legitimate completion statuses; introduce a boolean flag (e.g., EXITED_BY_SIGNAL
or interrupted_by_signal) initialized before the main loop (next to loop_count
initialization) and change signal handlers so SIGINT/SIGTERM set that flag then
call cleanup, keep the EXIT trap for normal exits, and modify cleanup (the
cleanup function that calls reset_session and update_status) to only call
update_status("interrupted","stopped") when the flag is true (and still perform
reset_session when appropriate); ensure normal loop breaks continue to set
completion status and exit without setting the flag.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6ff27b4 and 1350058.

📒 Files selected for processing (15)

CLAUDE.md
lib/circuit_breaker.sh
lib/enable_core.sh
lib/response_analyzer.sh
ralph_loop.sh
setup.sh
templates/ralphrc.template
tests/unit/test_circuit_breaker_recovery.bats
tests/unit/test_cli_modern.bats
tests/unit/test_exit_detection.bats
tests/unit/test_file_protection.bats
tests/unit/test_integrity_check.bats
tests/unit/test_json_parsing.bats
tests/unit/test_rate_limiting.bats
tests/unit/test_session_continuity.bats

💤 Files with no reviewable changes (1)

tests/unit/test_integrity_check.bats

ralph_loop.sh

coderabbitai

🧹 Nitpick comments (3)

ralph_loop.sh (2)

745-745: SC2155: Separate declaration from assignment.

Same pattern as lines 576-577.

♻️ Proposed fix for SC2155

-        local incomplete_tasks=$(grep -cE "^[[:space:]]*- \[ \]" "$RALPH_DIR/fix_plan.md" 2>/dev/null || echo "0")
+        local incomplete_tasks
+        incomplete_tasks=$(grep -cE "^[[:space:]]*- \[ \]" "$RALPH_DIR/fix_plan.md" 2>/dev/null || echo "0")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@ralph_loop.sh` at line 745, The SC2155 warning indicates you should not
combine declaration and command-substitution assignment; modify the block that
sets the local variable incomplete_tasks (which counts " - [ ]" in
"$RALPH_DIR/fix_plan.md") by declaring the variable with local incomplete_tasks
first, then on the next line assign it using the grep/... || echo "0" command
substitution; ensure you still redirect errors to /dev/null and preserve the
same pattern used earlier (see the similar handling around lines with the same
pattern).

576-577: SC2155: Separate declaration from assignment to avoid masking return values.

While the || echo "0" fallback handles grep failures gracefully, combining local declaration with command substitution masks the return value. This is flagged by shellcheck.

♻️ Proposed fix for SC2155

-        local uncompleted_items=$(grep -cE "^[[:space:]]*- \[ \]" "$RALPH_DIR/fix_plan.md" 2>/dev/null || echo "0")
-        local completed_items=$(grep -cE "^[[:space:]]*- \[[xX]\]" "$RALPH_DIR/fix_plan.md" 2>/dev/null || echo "0")
+        local uncompleted_items
+        uncompleted_items=$(grep -cE "^[[:space:]]*- \[ \]" "$RALPH_DIR/fix_plan.md" 2>/dev/null || echo "0")
+        local completed_items
+        completed_items=$(grep -cE "^[[:space:]]*- \[[xX]\]" "$RALPH_DIR/fix_plan.md" 2>/dev/null || echo "0")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@ralph_loop.sh` around lines 576 - 577, The two variables uncompleted_items
and completed_items are declared and assigned in one statement which masks the
command substitution return value (SC2155); fix by declaring each local first
(local uncompleted_items; local completed_items) and then assign them in
separate statements using the existing command substitutions with the || echo
"0" fallback (e.g., uncompleted_items=$(grep -cE "^[[:space:]]*- \[ \]"
"$RALPH_DIR/fix_plan.md" 2>/dev/null || echo "0") and similarly for
completed_items) so the return status of the commands is preserved and
shellcheck SC2155 is resolved.

tests/unit/test_cli_modern.bats (1)

111-111: Test grep pattern differs from implementation - missing indented checkbox support.

The test uses grep -c "^- \[ \]" but ralph_loop.sh at line 745 uses grep -cE "^[[:space:]]*- \[ \]" to support indented checkboxes. This won't cause test failures with current fixtures but represents drift between test and implementation.

♻️ Proposed fix to align with implementation

         if [[ -f "$RALPH_DIR/fix_plan.md" ]]; then
-            local incomplete_tasks=$(grep -c "^- \[ \]" "$RALPH_DIR/fix_plan.md" 2>/dev/null || echo "0")
+            local incomplete_tasks=$(grep -cE "^[[:space:]]*- \[ \]" "$RALPH_DIR/fix_plan.md" 2>/dev/null || echo "0")
             context+="Remaining tasks: ${incomplete_tasks}. "
         fi

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/unit/test_cli_modern.bats` at line 111, Update the test's grep pattern
to match the implementation's indented-checkbox handling: replace the current
grep -c "^- \[ \]" in the local incomplete_tasks assignment with a grep -cE
using the same regex as ralph_loop.sh ("^[[:space:]]*- \[ \]") so the test
counts both top-level and indented unchecked boxes exactly like the code under
test.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@ralph_loop.sh`:
- Line 745: The SC2155 warning indicates you should not combine declaration and
command-substitution assignment; modify the block that sets the local variable
incomplete_tasks (which counts " - [ ]" in "$RALPH_DIR/fix_plan.md") by
declaring the variable with local incomplete_tasks first, then on the next line
assign it using the grep/... || echo "0" command substitution; ensure you still
redirect errors to /dev/null and preserve the same pattern used earlier (see the
similar handling around lines with the same pattern).
- Around line 576-577: The two variables uncompleted_items and completed_items
are declared and assigned in one statement which masks the command substitution
return value (SC2155); fix by declaring each local first (local
uncompleted_items; local completed_items) and then assign them in separate
statements using the existing command substitutions with the || echo "0"
fallback (e.g., uncompleted_items=$(grep -cE "^[[:space:]]*- \[ \]"
"$RALPH_DIR/fix_plan.md" 2>/dev/null || echo "0") and similarly for
completed_items) so the return status of the commands is preserved and
shellcheck SC2155 is resolved.

In `@tests/unit/test_cli_modern.bats`:
- Line 111: Update the test's grep pattern to match the implementation's
indented-checkbox handling: replace the current grep -c "^- \[ \]" in the local
incomplete_tasks assignment with a grep -cE using the same regex as
ralph_loop.sh ("^[[:space:]]*- \[ \]") so the test counts both top-level and
indented unchecked boxes exactly like the code under test.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1350058 and 8b5a662.

📒 Files selected for processing (3)

CLAUDE.md
ralph_loop.sh
tests/unit/test_cli_modern.bats

Replace global set -e with targeted error handling patterns: - Add explicit error guards on all library source statements - Replace || true with || echo "0" for consistent grep fallbacks - Remove set +e/set -e toggles in live mode pipeline (no longer needed) - Separate stderr to dedicated file to prevent jq pipeline corruption (frankbria#190) - Improve cleanup() with trap_exit_code capture and reentrancy guard - Add conditional analysis failure handling with stale file cleanup - Remove unnecessary || true from reset_session and integrity check Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai

🧹 Nitpick comments (2)

ralph_loop.sh (1)

1215-1237: LGTM! Stderr separation prevents JSON pipeline corruption.

Redirecting stderr to a separate file (2>"$stderr_file") is the correct solution for Issue #190. Node.js warnings would otherwise corrupt the jq JSON stream. The cleanup logic appropriately removes empty files and logs warnings for non-empty ones.

Optional: To address SC2155, you could separate declaration and assignment:

♻️ Optional refactor for strict shellcheck compliance

-        local stderr_file="${LOG_DIR}/claude_stderr_$(date '+%Y%m%d_%H%M%S').log"
+        local stderr_file
+        stderr_file="${LOG_DIR}/claude_stderr_$(date '+%Y%m%d_%H%M%S').log"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@ralph_loop.sh` around lines 1215 - 1237, The shellcheck SC2155 warning can be
resolved by separating variable declarations from command substitutions: instead
of using "local stderr_file=\"${LOG_DIR}/claude_stderr_$(date
'+%Y%m%d_%H%M%S').log\"" declare the variable first (e.g., local stderr_file)
and then assign stderr_file="$(...)" so the command substitution isn't done in
the same line as the local. Do the same for other local-with-assignment patterns
in this block (e.g., "local -a pipe_status" then pipe_status=(
"${PIPESTATUS[@]}" ), and declare exit_code with local before assigning) while
keeping the same variable names (stderr_file, portable_timeout invocation,
PIPESTATUS capture into pipe_status, exit_code, and
log_status/CLAUDE_TIMEOUT_MINUTES handling) and preserving the existing behavior
and cleanup logic.

tests/unit/test_cli_modern.bats (1)

1180-1186: Consider strengthening the set -e detection pattern.

The pattern '^set -e' only matches set -e at line start. An indented set -e (e.g., inside a function) would not be caught.

♻️ Optional: more robust pattern

-    run bash -c "grep -n '^set -e' '$script'"
+    run bash -c "grep -nE '^[[:space:]]*set -e' '$script'"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/unit/test_cli_modern.bats` around lines 1180 - 1186, The test
"ralph_loop.sh does not use set -e" needs a stronger detection pattern: change
the grep invocation that currently searches for '^set -e' to use an
extended-regex that allows leading whitespace and flexible spacing between 'set'
and '-e' (e.g. match lines with optional indentation then 'set' then spaces then
'-e'), and ensure commented lines are excluded (filter out lines that begin with
optional whitespace followed by '#'); update the test around the script variable
'script' and the run/grep pipeline to use the extended regex and the
comment-filter so indented or spaced occurrences are caught but comments remain
ignored.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@ralph_loop.sh`:
- Around line 1215-1237: The shellcheck SC2155 warning can be resolved by
separating variable declarations from command substitutions: instead of using
"local stderr_file=\"${LOG_DIR}/claude_stderr_$(date '+%Y%m%d_%H%M%S').log\""
declare the variable first (e.g., local stderr_file) and then assign
stderr_file="$(...)" so the command substitution isn't done in the same line as
the local. Do the same for other local-with-assignment patterns in this block
(e.g., "local -a pipe_status" then pipe_status=( "${PIPESTATUS[@]}" ), and
declare exit_code with local before assigning) while keeping the same variable
names (stderr_file, portable_timeout invocation, PIPESTATUS capture into
pipe_status, exit_code, and log_status/CLAUDE_TIMEOUT_MINUTES handling) and
preserving the existing behavior and cleanup logic.

In `@tests/unit/test_cli_modern.bats`:
- Around line 1180-1186: The test "ralph_loop.sh does not use set -e" needs a
stronger detection pattern: change the grep invocation that currently searches
for '^set -e' to use an extended-regex that allows leading whitespace and
flexible spacing between 'set' and '-e' (e.g. match lines with optional
indentation then 'set' then spaces then '-e'), and ensure commented lines are
excluded (filter out lines that begin with optional whitespace followed by '#');
update the test around the script variable 'script' and the run/grep pipeline to
use the extended regex and the comment-filter so indented or spaced occurrences
are caught but comments remain ignored.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8b5a662 and b0fc108.

📒 Files selected for processing (3)

CLAUDE.md
ralph_loop.sh
tests/unit/test_cli_modern.bats

…nkbria#208) Bug 3 (stderr separation: 2>&1 → 2>"$stderr_file") is already implemented in PR frankbria#208 (refactor/remove-set-e). Removing it from this PR eliminates the merge conflict between frankbria#202 and frankbria#208 in the ralph_loop.sh pipeline area and prevents duplicate changes. Changes: - ralph_loop.sh: revert pipeline to 2>&1, remove stderr_file var and stderr logging block - test_cli_modern.bats: revert grep patterns to 2>&1, remove 3 stderr separation tests - CLAUDE.md: remove stderr separation paragraph and test description Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai bot reviewed Feb 28, 2026

View reviewed changes

ralph_loop.sh Outdated Show resolved Hide resolved

coderabbitai bot reviewed Feb 28, 2026

View reviewed changes

This was referenced Feb 28, 2026

fix(loop): harden cleanup handling for signal, error, and crash exit paths #205

Closed

fix(loop): guard bare function calls against set -e script termination #207

Closed

timothy-20 force-pushed the refactor/remove-set-e branch from 8b5a662 to b0fc108 Compare February 28, 2026 08:33

coderabbitai bot reviewed Feb 28, 2026

View reviewed changes

timothy-20 mentioned this pull request Feb 28, 2026

fix(#190): SESSION_CONTINUITY=false bugs + startup version check #202

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(loop): remove set -e in favor of explicit error handling#208

refactor(loop): remove set -e in favor of explicit error handling#208
timothy-20 wants to merge 1 commit intofrankbria:mainfrom
timothy-20:refactor/remove-set-e

timothy-20 commented Feb 28, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Feb 28, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

timothy-20 commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Why

Intentionally kept || true (3 instances)

Test plan

Uh oh!

coderabbitai bot commented Feb 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

timothy-20 commented Feb 28, 2026 •

edited

Loading

Intentionally kept `|| true` (3 instances)

coderabbitai bot commented Feb 28, 2026 •

edited

Loading