Skip to content

fix: prevent gateway double-spawn when process undetected by listProcesses()#293

Closed
amit183239 wants to merge 1 commit intocloudflare:mainfrom
amit183239:fix/gateway-double-spawn-289
Closed

fix: prevent gateway double-spawn when process undetected by listProcesses()#293
amit183239 wants to merge 1 commit intocloudflare:mainfrom
amit183239:fix/gateway-double-spawn-289

Conversation

@amit183239
Copy link
Copy Markdown

@amit183239 amit183239 commented Feb 25, 2026

Fixes #289

Summary

  • findExistingMoltbotProcess() matches command strings like start-openclaw.sh and openclaw gateway, but the sandbox process list may expose the process as bash /usr/local/bin/start-openclaw.sh (full path with shell prefix), which the existing checks miss
  • When the process goes undetected, ensureMoltbotGateway() spawns a second instance which immediately fails with: gateway already running (pid N); lock timeout after 5000ms and Port 18789 is already in use

Changes

  1. Broader command matching — also match /usr/local/bin/start-openclaw.sh (full path) so bash-invoked scripts are detected correctly

  2. TCP port pre-check before spawning — before starting a new process, probe port 18789 via TCP. If it's already listening, the gateway is running regardless of what listProcesses() returned, so skip the spawn entirely. Acts as a safety net against any future process-detection gaps.

…esses()

Fixes cloudflare#289

Two changes to avoid the "gateway already running / port in use" error
that occurs when the container already has a running gateway that
findExistingMoltbotProcess() fails to detect:

1. Broaden command matching to also check for
   `/usr/local/bin/start-openclaw.sh` (full path), so invocations via
   `bash /usr/local/bin/start-openclaw.sh` are recognised.

2. Add a TCP port pre-check in ensureMoltbotGateway() before spawning.
   If port 18789 is already listening, the gateway is up regardless of
   what listProcesses() returned, so we skip the spawn entirely.
andreasjansson pushed a commit to andreasjansson/moltworker that referenced this pull request Mar 25, 2026
Fixes cloudflare#289

When the gateway is already running but listProcesses() fails to detect
it (e.g. the command string appears in an unexpected form), a second
spawn attempt causes 'port already in use' errors.

Add a TCP port probe (nc -z) as a safety net before spawning. If port
18789 is already listening, the gateway is definitively running — return
null to signal 'gateway is up' without a process handle. All callers
only need the gateway to be reachable; none use the returned Process
object.

This avoids the bug in cloudflare#293 where the port-check fallback tried to
return an arbitrary running process from listProcesses(), which could
be a completely unrelated process (rclone sync, shell session, etc.),
and fell through to spawn on no match — defeating the safety net.
andreasjansson pushed a commit that referenced this pull request Mar 25, 2026
Fixes #289

When the gateway is already running but listProcesses() fails to detect
it (e.g. the command string appears in an unexpected form), a second
spawn attempt causes 'port already in use' errors.

Add a TCP port probe (nc -z) as a safety net before spawning. If port
18789 is already listening, the gateway is definitively running — return
null to signal 'gateway is up' without a process handle. All callers
only need the gateway to be reachable; none use the returned Process
object.

This avoids the bug in #293 where the port-check fallback tried to
return an arbitrary running process from listProcesses(), which could
be a completely unrelated process (rclone sync, shell session, etc.),
and fell through to spawn on no match — defeating the safety net.
andreasjansson pushed a commit that referenced this pull request Mar 27, 2026
Fixes #289

When the gateway is already running but listProcesses() fails to detect
it (e.g. the command string appears in an unexpected form), a second
spawn attempt causes 'port already in use' errors.

Add a TCP port probe (nc -z) as a safety net before spawning. If port
18789 is already listening, the gateway is definitively running — return
null to signal 'gateway is up' without a process handle. All callers
only need the gateway to be reachable; none use the returned Process
object.

This avoids the bug in #293 where the port-check fallback tried to
return an arbitrary running process from listProcesses(), which could
be a completely unrelated process (rclone sync, shell session, etc.),
and fell through to spawn on no match — defeating the safety net.
andreasjansson pushed a commit that referenced this pull request Mar 27, 2026
Fixes #289

When the gateway is already running but listProcesses() fails to detect
it (e.g. the command string appears in an unexpected form), a second
spawn attempt causes 'port already in use' errors.

Add a TCP port probe (nc -z) as a safety net before spawning. If port
18789 is already listening, the gateway is definitively running — return
null to signal 'gateway is up' without a process handle. All callers
only need the gateway to be reachable; none use the returned Process
object.

This avoids the bug in #293 where the port-check fallback tried to
return an arbitrary running process from listProcesses(), which could
be a completely unrelated process (rclone sync, shell session, etc.),
and fell through to spawn on no match — defeating the safety net.
andreasjansson pushed a commit that referenced this pull request Mar 27, 2026
Fixes #289

When the gateway is already running but listProcesses() fails to detect
it (e.g. the command string appears in an unexpected form), a second
spawn attempt causes 'port already in use' errors.

Add a TCP port probe (nc -z) as a safety net before spawning. If port
18789 is already listening, the gateway is definitively running — return
null to signal 'gateway is up' without a process handle. All callers
only need the gateway to be reachable; none use the returned Process
object.

This avoids the bug in #293 where the port-check fallback tried to
return an arbitrary running process from listProcesses(), which could
be a completely unrelated process (rclone sync, shell session, etc.),
and fell through to spawn on no match — defeating the safety net.
andreasjansson pushed a commit that referenced this pull request Mar 27, 2026
Fixes #289

When the gateway is already running but listProcesses() fails to detect
it (e.g. the command string appears in an unexpected form), a second
spawn attempt causes 'port already in use' errors.

Add a TCP port probe (nc -z) as a safety net before spawning. If port
18789 is already listening, the gateway is definitively running — return
null to signal 'gateway is up' without a process handle. All callers
only need the gateway to be reachable; none use the returned Process
object.

This avoids the bug in #293 where the port-check fallback tried to
return an arbitrary running process from listProcesses(), which could
be a completely unrelated process (rclone sync, shell session, etc.),
and fell through to spawn on no match — defeating the safety net.
@andreasjansson
Copy link
Copy Markdown
Member

Superseded by PRs #336 and #337 (both merged Mar 27-28), which implemented the same port probe safety net plus additional fixes: reliable process kill via pgrep/pkill/ss, crash retry logic, and backup/restore reliability improvements. Thank you for the contribution — the port pre-check approach you proposed was exactly right and was incorporated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Gateway double-spawn when process undetected by listProcesses()

2 participants