fix(omni): use --resume for respawn with JSONL-missing fallback#2490
fix(omni): use --resume for respawn with JSONL-missing fallback#2490rodriguess-caio wants to merge 6 commits into
Conversation
Claude Code fails when asked to --resume a session whose JSONL file is missing (e.g. after a cleanup or on a fresh machine). By rewriting the flag to --session-id we keep the same identifier but force a fresh session, which always succeeds.
…cripts" This reverts commit 771e1c3.
buildOmniSpawnParams now always emits sessionId (never resume), so buildLaunchCommand produces --session-id <id>. Unlike --resume, this flag attaches to an existing JSONL transcript when present but gracefully starts a fresh session with the same id when the transcript is missing (e.g. after cleanup or on a fresh machine) — preventing hard failures on respawn. Fix is applied at the source (where the command is built), not in the transport layer (tmux-launch-script), which is now provider-agnostic. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When respawning a per-chat agent with a prior Claude session id, emit --resume (not --session-id) so Claude reattaches to the existing JSONL. Add a 3s liveness check after launch: if --resume silently fails (JSONL missing), fall back to a fresh --session-id so the inbound message is not lost. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (8)
📝 WalkthroughWalkthroughThis PR refactors tmux command spawning to generate and source temporary shell scripts instead of injecting commands directly via ChangesScript-based tmux spawning and executor integration
🎯 3 (Moderate) | ⏱️ ~20 minutes ✨ Finishing Touches🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Closing — superseded by #2489. |
There was a problem hiding this comment.
Code Review
This pull request refactors the tmux launch script logic by extracting writeTmuxLaunchScript into a shared module (src/lib/tmux-launch-script.ts) to prevent command corruption during complex tmux spawns. It also introduces unit and smoke tests, updates ClaudeCodeOmniExecutor to use this script-based path with a fallback mechanism for failed session resumes, and refines process tracking in isPaneProcessRunning. The review feedback points out two important issues: first, isPaneProcessRunning should also check the panePid itself to support directly-executed processes; second, the exec prefix should be restored in the shared launch script to prevent wrapper shell processes from lingering in the background.
| // Check direct children and grandchildren for the target process name | ||
| const output = exec( | ||
| `pgrep -la -P ${panePid} 2>/dev/null; for cpid in $(pgrep -P ${panePid} 2>/dev/null); do pgrep -la -P "$cpid" 2>/dev/null; done; true`, | ||
| `pgrep -la -P ${panePid} 2>/dev/null; for cpid in $(pgrep -P ${panePid} 2>/dev/null); do pgrep -la -P "$cpid" 2>/dev/null; ps -p "$cpid" -o comm= 2>/dev/null; done; true`, |
There was a problem hiding this comment.
The isPaneProcessRunning function currently only checks the direct children and grandchildren of panePid to see if the target process is running. It does not check the panePid process itself.
If the target process is executed directly (e.g., using exec in a launch script or if the pane was created directly with the command), panePid itself will be the target process. In this case, isPaneProcessRunning will return false because it only queries descendants.
To make this check robust for both shell-wrapped and directly-executed processes, we should also check the command name of panePid itself using ps -p ${panePid} -o comm=.
| `pgrep -la -P ${panePid} 2>/dev/null; for cpid in $(pgrep -P ${panePid} 2>/dev/null); do pgrep -la -P "$cpid" 2>/dev/null; ps -p "$cpid" -o comm= 2>/dev/null; done; true`, | |
| `ps -p ${panePid} -o comm= 2>/dev/null; pgrep -la -P ${panePid} 2>/dev/null; for cpid in $(pgrep -P ${panePid} 2>/dev/null); do pgrep -la -P "$cpid" 2>/dev/null; ps -p "$cpid" -o comm= 2>/dev/null; done; true`, |
| mkdirSync(dir, { recursive: true }); | ||
| const safeId = workerId.replace(/[^a-zA-Z0-9._-]/g, '-'); | ||
| const scriptPath = join(dir, `${safeId}-${Date.now().toString(36)}.sh`); | ||
| writeFileSync(scriptPath, `#!/bin/sh\n${fullCommand}\n`, { mode: 0o700 }); |
There was a problem hiding this comment.
The extracted writeTmuxLaunchScript function is missing the exec prefix before ${fullCommand} that was present in the original implementation in src/term-commands/agents.ts.
Without exec, the shell wrapper process (/bin/sh) remains alive in the background waiting for the child process to exit. This wastes system resources (an extra process per spawned agent) and can interfere with signal propagation and PID tracking (since #{pane_pid} might refer to the wrapper shell instead of the actual agent process).
Please restore the exec prefix.
| writeFileSync(scriptPath, `#!/bin/sh\n${fullCommand}\n`, { mode: 0o700 }); | |
| writeFileSync(scriptPath, "#!/bin/sh\nexec ${fullCommand}\n", { mode: 0o700 }); |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: dc6177bfde
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| .join(' '); | ||
| const cmd = envPrefix ? `${envPrefix} ${launch.command}` : launch.command; | ||
| const scriptPath = writeTmuxLaunchScript(`omni-${chatId}`, cmd); | ||
| await executeTmux(`send-keys -t '${paneId}' "source ${scriptPath}" Enter`); |
There was a problem hiding this comment.
Avoid sourcing the launch script with a bash-only builtin
In tmux panes whose default shell is POSIX sh/dash, this sends source ... to the pane, but source is not available there, so omni spawns never start; non-resume spawns return a new session id without any liveness check, and resume fallback repeats the same failing command. ensureTeamWindow creates a plain tmux window without forcing bash, so this depends on the user's tmux/default shell. Execute the script path directly or use a POSIX-compatible invocation with proper quoting.
Useful? React with 👍 / 👎.
Summary
buildOmniSpawnParamsto emitresume(notsessionId) when a prior Claude session id exists, sobuildLaunchCommandgenerates--resume <id>and Claude reattaches to the existing JSONL transcriptsendToPanehelper insidelaunchOmniProcessInPaneto avoid duplicating the env-prefix + script-write + send-keys flow--resumelaunch: if Claude exits immediately (JSONL missing — e.g. after cleanup or on a fresh machine), detects the silent failure viaisPaneProcessRunningand falls back to a fresh--session-idso the inbound message is not lostTest plan
bun test src/services/executors/claude-code.test.ts— 45 tests pass🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Chores