fix: stagger vLLM engine startup to avoid EADDRINUSE#1356
fix: stagger vLLM engine startup to avoid EADDRINUSE#1356penfever wants to merge 1 commit intoNovaSky-AI:mainfrom
Conversation
When multiple inference engines start on the same node simultaneously, vLLM's get_open_port() can return the same port to different engines (TOCTOU race). This causes EADDRINUSE failures during engine init. Add a random 1.5-3.0s delay before engine creation to desynchronise the port allocation calls across engines on the same node. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request addresses a race condition during vLLM engine startup by introducing a random delay. While this is a valid approach to reduce the likelihood of port collisions, it introduces a performance overhead for all startups and doesn't completely eliminate the race condition. I've suggested a more robust solution using a file lock to properly serialize the engine initialization, which would solve the problem deterministically without unnecessary delays.
| # Stagger engine startup to avoid TOCTOU port collisions (EADDRINUSE). | ||
| # vLLM's get_open_port() queries a free port then releases the socket; | ||
| # if multiple engines on the same node call it simultaneously, they can | ||
| # get the same port. A random delay desynchronises the calls. | ||
| import random | ||
|
|
||
| _stagger = random.uniform(1.5, 3.0) | ||
| logger.info(f"Engine startup stagger: sleeping {_stagger:.2f}s to avoid port collisions") | ||
| time.sleep(_stagger) | ||
|
|
||
| engine = vllm.AsyncLLMEngine.from_engine_args(engine_args, stat_loggers=stat_loggers) |
There was a problem hiding this comment.
Using time.sleep with a random delay is a good first step to mitigate the race condition, but it has a few drawbacks:
- It adds a significant startup delay (1.5-3.0s) even for single-engine scenarios, which is a performance regression.
- It reduces the probability of a port collision but doesn't eliminate it. The race condition can still occur, albeit less frequently.
A more robust and efficient solution would be to use a file-based lock to serialize the engine initialization across different processes on the same node. This ensures that only one engine attempts to allocate a port at a time, completely avoiding the race condition without introducing an unnecessary delay when only one engine is starting.
You can use the filelock library for this. This approach would require adding filelock as a new dependency. Also, the from filelock import FileLock import should be moved to the top of the file for better code style.
| # Stagger engine startup to avoid TOCTOU port collisions (EADDRINUSE). | |
| # vLLM's get_open_port() queries a free port then releases the socket; | |
| # if multiple engines on the same node call it simultaneously, they can | |
| # get the same port. A random delay desynchronises the calls. | |
| import random | |
| _stagger = random.uniform(1.5, 3.0) | |
| logger.info(f"Engine startup stagger: sleeping {_stagger:.2f}s to avoid port collisions") | |
| time.sleep(_stagger) | |
| engine = vllm.AsyncLLMEngine.from_engine_args(engine_args, stat_loggers=stat_loggers) | |
| # Stagger engine startup to avoid TOCTOU port collisions (EADDRINUSE). | |
| # vLLM's get_open_port() queries a free port then releases the socket; | |
| # if multiple engines on the same node call it simultaneously, they can | |
| # get the same port. A file lock serialises port allocation. | |
| from filelock import FileLock | |
| # A timeout is added to prevent indefinite waiting. | |
| lock_path = "/tmp/vllm_engine_init.lock" | |
| logger.info(f"Attempting to acquire lock {lock_path} for vLLM engine startup to avoid port collisions.") | |
| with FileLock(lock_path, timeout=60): | |
| logger.info(f"Acquired lock {lock_path}, proceeding with engine startup.") | |
| engine = vllm.AsyncLLMEngine.from_engine_args(engine_args, stat_loggers=stat_loggers) |
Summary
get_open_port()can return the same port to different engines (TOCTOU race), causingEADDRINUSEfailures during engine initAsyncLLMEngine.from_engine_args()to desynchronise port allocation across engines on the same nodeTest plan
🤖 Generated with Claude Code