[rollout] feat: Fix partial load problem, Add vlm support for trtllm rollout#5149
[rollout] feat: Fix partial load problem, Add vlm support for trtllm rollout#5149SchumiDing wants to merge 12 commits intoverl-project:mainfrom
Conversation
|
#5042 roadmap of trtllm rollout |
| from tensorrt_llm.logger import logger | ||
|
|
||
|
|
||
| class WorkerExtension: |
There was a problem hiding this comment.
Here is the latest WorkerExtension in the TensorRT-LLM repo. Are there any motivations for implementing a new one in verl repo? I am thinking about how to unify both. Ideally, we may update the one in the TensorRT-LLM codebase, but if we need a minor change on it before the next trtllm version bump up, @hchings do you have a suggestion?
There was a problem hiding this comment.
Yeah. Ideally, we should still use the worker extension from the tedsnort-llm repo. But to support model that do not allow partial loading, I suppose the use of self.engine.model_engine.model_loader.reload should be able to use with param: allow_partial_loading=False
There was a problem hiding this comment.
I add this new worker extension is to support allow_partial_loading=False, cause tensorrt-llm always set this param as True, but some models do not support
There was a problem hiding this comment.
I'll prefer that we keep this at TensorRT-LLM repo instead and make it generic for other RL FWs to reuse in the future.
| if self.is_vlm_model: | ||
| from tensorrt_llm.inputs.multimodal import MultimodalServerConfig | ||
|
|
||
| multimodal_config = MultimodalServerConfig( |
There was a problem hiding this comment.
Can you add a unittest for this new feature? There is a test_trtllm_async_server.py.
There was a problem hiding this comment.
sure, I'm adding one
There was a problem hiding this comment.
The test script and relating test workflow has been added
There was a problem hiding this comment.
I didn't find the test_trtllm_async_server.py in verl repo, so I write a test script for test on both llm rollout and vlm rollout of tensorrt-llm rollout worker
There was a problem hiding this comment.
I didn't find the test_trtllm_async_server.py in verl repo
We have a unittest MR that should be merge shortly, that contains the test_trtllm_async_server.py.
|
Sorry, there may some error when using with latest tensorrt-llm, I'm fixing it |
| from tensorrt_llm.logger import logger | ||
|
|
||
|
|
||
| class WorkerExtension: |
There was a problem hiding this comment.
I'll prefer that we keep this at TensorRT-LLM repo instead and make it generic for other RL FWs to reuse in the future.
|
yeah, it's sensible to keep the verl/workers/rollout/trtllm_rollout/trtllm_worker_extension.py in TensorRT-LLM repo |
What does this PR do?
Some models do not support partial loading, when this happen the rollout manager back to original full parameter update
Add vlm support for trtllm rollout
Checklist Before Starting
[{modules}] {type}: {description}(This will be checked by the CI){modules}includefsdp,megatron,veomni,sglang,vllm,rollout,trainer,ci,training_utils,recipe,hardware,deployment,ray,worker,single_controller,misc,perf,model,algo,env,tool,ckpt,doc,data,cfg,reward,like[megatron, fsdp, doc]{type}is infeat,fix,refactor,chore,test[BREAKING]to the beginning of the title.[BREAKING][fsdp, megatron] feat: dynamic batchingTest
Succeed with llama-3-11b-vision model with trtllm rollout
API and Usage Example
# Add code snippet or script demonstrating how to use thisDesign & Code Changes
Checklist Before Submitting
Important
Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.
pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=alwaysci-requestchannel in theverlSlack workspace. (If not accessible, please try the Feishu group (飞书群).)recipesubmodule, please also update the reference to the submodule commit viagit submodule update --remoteorcd recipe && git pull origin main.