Skip to content

fix: stagger vLLM engine startup to avoid EADDRINUSE#1356

Open
penfever wants to merge 1 commit intoNovaSky-AI:mainfrom
penfever:penfever/vllm-engine-startup-stagger
Open

fix: stagger vLLM engine startup to avoid EADDRINUSE#1356
penfever wants to merge 1 commit intoNovaSky-AI:mainfrom
penfever:penfever/vllm-engine-startup-stagger

Conversation

@penfever
Copy link
Copy Markdown

@penfever penfever commented Mar 20, 2026

Summary

  • When multiple inference engines start on the same node simultaneously, vLLM's get_open_port() can return the same port to different engines (TOCTOU race), causing EADDRINUSE failures during engine init
  • Adds a random 1.5-3.0s delay before AsyncLLMEngine.from_engine_args() to desynchronise port allocation across engines on the same node

Test plan

  • Verify multi-engine startup on a single node no longer hits EADDRINUSE
  • Verify single-engine startup is unaffected (just a brief delay)

🤖 Generated with Claude Code


Open with Devin

When multiple inference engines start on the same node simultaneously,
vLLM's get_open_port() can return the same port to different engines
(TOCTOU race). This causes EADDRINUSE failures during engine init.

Add a random 1.5-3.0s delay before engine creation to desynchronise
the port allocation calls across engines on the same node.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 2 additional findings.

Open in Devin Review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a race condition during vLLM engine startup by introducing a random delay. While this is a valid approach to reduce the likelihood of port collisions, it introduces a performance overhead for all startups and doesn't completely eliminate the race condition. I've suggested a more robust solution using a file lock to properly serialize the engine initialization, which would solve the problem deterministically without unnecessary delays.

Comment on lines +364 to 374
# Stagger engine startup to avoid TOCTOU port collisions (EADDRINUSE).
# vLLM's get_open_port() queries a free port then releases the socket;
# if multiple engines on the same node call it simultaneously, they can
# get the same port. A random delay desynchronises the calls.
import random

_stagger = random.uniform(1.5, 3.0)
logger.info(f"Engine startup stagger: sleeping {_stagger:.2f}s to avoid port collisions")
time.sleep(_stagger)

engine = vllm.AsyncLLMEngine.from_engine_args(engine_args, stat_loggers=stat_loggers)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using time.sleep with a random delay is a good first step to mitigate the race condition, but it has a few drawbacks:

  1. It adds a significant startup delay (1.5-3.0s) even for single-engine scenarios, which is a performance regression.
  2. It reduces the probability of a port collision but doesn't eliminate it. The race condition can still occur, albeit less frequently.

A more robust and efficient solution would be to use a file-based lock to serialize the engine initialization across different processes on the same node. This ensures that only one engine attempts to allocate a port at a time, completely avoiding the race condition without introducing an unnecessary delay when only one engine is starting.

You can use the filelock library for this. This approach would require adding filelock as a new dependency. Also, the from filelock import FileLock import should be moved to the top of the file for better code style.

Suggested change
# Stagger engine startup to avoid TOCTOU port collisions (EADDRINUSE).
# vLLM's get_open_port() queries a free port then releases the socket;
# if multiple engines on the same node call it simultaneously, they can
# get the same port. A random delay desynchronises the calls.
import random
_stagger = random.uniform(1.5, 3.0)
logger.info(f"Engine startup stagger: sleeping {_stagger:.2f}s to avoid port collisions")
time.sleep(_stagger)
engine = vllm.AsyncLLMEngine.from_engine_args(engine_args, stat_loggers=stat_loggers)
# Stagger engine startup to avoid TOCTOU port collisions (EADDRINUSE).
# vLLM's get_open_port() queries a free port then releases the socket;
# if multiple engines on the same node call it simultaneously, they can
# get the same port. A file lock serialises port allocation.
from filelock import FileLock
# A timeout is added to prevent indefinite waiting.
lock_path = "/tmp/vllm_engine_init.lock"
logger.info(f"Attempting to acquire lock {lock_path} for vLLM engine startup to avoid port collisions.")
with FileLock(lock_path, timeout=60):
logger.info(f"Acquired lock {lock_path}, proceeding with engine startup.")
engine = vllm.AsyncLLMEngine.from_engine_args(engine_args, stat_loggers=stat_loggers)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant