[rollout,vllm] Fix DP args and local_rank for Ray NOSET_VISIBLE_DEVICES by JohnConnor123 · Pull Request #5233 · verl-project/verl

JohnConnor123 · 2026-02-07T18:11:12Z

Summary

Fix A: ensure vLLM receives DP CLI args when data_parallel_size > 1 (not only when EP is enabled).
Fix B: when running under Ray with RAY_EXPERIMENTAL_NOSET_CUDA_VISIBLE_DEVICES=1 and data_parallel_size > 1,
compute the correct vLLM-local LOCAL_RANK (local rank within TP×PP) to avoid DP adjusted local rank ... out of bounds
and incorrect GPU binding.
This is a refinement of the comments from PR [rollout,vllm] Fix DP args and local_rank for Ray NOSET_VISIBLE_DEVICES #5152

Context / Motivation

In DP configs (typically dp=2,tp=1,pp=1,nnodes=1):

vLLM could start without DP settings if the condition was tied only to EP.
Under Ray NOSET, using ray.get_runtime_context().get_accelerator_ids() to derive local rank can produce a rank that becomes
out-of-bounds after vLLM DP adjustment.

Checklist

Bug fixed
Repro / test plan included
Regression test (real Ray+vLLM e2e)

gemini-code-assist

Code Review

This pull request introduces two important fixes for running vLLM with data parallelism under Ray. The first fix ensures that data parallelism arguments are correctly passed to vLLM, which was previously only done for expert parallelism. The second fix aims to correct the local_rank for workers when using Ray's NOSET_VISIBLE_DEVICES mode to prevent out-of-bounds errors.

My review focuses on the implementation of the local_rank adjustment. While the change to include DP arguments is correct, the logic for local_rank correction in verl/workers/rollout/vllm_rollout/utils.py contains critical bugs related to argument parsing that would prevent it from functioning. I've provided a detailed comment with a corrected implementation.

verl/workers/rollout/vllm_rollout/utils.py

JohnConnor123 added 3 commits January 31, 2026 11:19

fix: pass DP args to vLLM when dp>1

81a8bcc

fix: correct vLLM local_rank under ray NOSET_CUDA_VISIBLE_DEVICES

58e42b4

fix: adjust vLLM local_rank in MP worker init

8a4c4dd

JohnConnor123 requested review from PeterSH6, chenhaiq and wuxibin89 as code owners February 7, 2026 18:11

gemini-code-assist bot reviewed Feb 7, 2026

View reviewed changes

verl/workers/rollout/vllm_rollout/utils.py Show resolved Hide resolved

fix: robustly extract vLLM worker rank/parallel_config

685d20c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[rollout,vllm] Fix DP args and local_rank for Ray NOSET_VISIBLE_DEVICES#5233

[rollout,vllm] Fix DP args and local_rank for Ray NOSET_VISIBLE_DEVICES#5233
JohnConnor123 wants to merge 4 commits intoverl-project:mainfrom
JohnConnor123:bugfix/vllm_dp_rank_and_args

JohnConnor123 commented Feb 7, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JohnConnor123 commented Feb 7, 2026

Summary

Context / Motivation

Checklist

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant