[worker] feat: Avoid redundant base weight sync when engine doesn't sleep by JohnConnor123 · Pull Request #5147 · verl-project/verl

JohnConnor123 · 2026-01-31T09:53:50Z

What does this PR do?

Fixed unnecessary base weight syncing for LoRA sleep_level=2 when rollout.free_cache_engine=False

Impact: up to 26% faster end-to-end training in some cases

When free_cache_engine=False, the rollout engine is not actually sleeping in a way that destroys base weights, so syncing base weights is redundant work.

Changes

Only collect send base_model_params when sleep_level==2 and rollout.free_cache_engine=True.

Perf (Qwen3-1.7B, 20 steps, mean over steps 1-20, tested on 2 rtx3090)

metric	baseline (ms/token)	patched (ms/token)	delta (ms/token)	delta %
timing_per_token_ms/gen	0.845	0.559	-0.286	-33.9%
timing_per_token_ms/update_actor	0.251	0.252	0.001	+0.48%
timing_per_token_ms/all	1.097	0.811	-0.286	-26.02%

Checklist

Fix applied
Local perf run plot attached
Added regression test (not included in this PR)

verl/workers/fsdp_workers.py

… doesn't sleep

Copilot

Pull request overview

This PR reduces unnecessary base-weight synchronization work for LoRA setups when the rollout engine is configured not to free/sleep weights (rollout.free_cache_engine=False), improving end-to-end training performance.

Changes:

Gate LoRA base-weight sync in Megatron and engine worker paths on rollout.free_cache_engine.
In FSDP worker rollout mode, only collect/send base_model_params for LoRA sleep_level==2 when rollout.free_cache_engine=True.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
`verl/workers/megatron_workers.py`	Avoids redundant LoRA base sync when the rollout engine doesn’t free weights.
`verl/workers/fsdp_workers.py`	Skips base model param collection/update for LoRA `sleep_level==2` unless the engine actually frees weights.
`verl/workers/engine_workers.py`	Aligns LoRA base sync behavior with `free_cache_engine` to avoid redundant work.

Comments suppressed due to low confidence (1)

verl/workers/fsdp_workers.py:752

device is referenced inside the per_tensor_base_params generator but is only defined in the else branch above (when peft_config is None or self.base_sync_done is false). If peft_config is not None and self.base_sync_done is true, this path will raise a NameError when sleep_level==2 && free_cache_engine (e.g., on subsequent rollout_mode calls). Define device = get_device_id() unconditionally before these generators (or inline inside the generator) so it is always available when converting base_model_params.

        if (
            peft_config is not None
            and getattr(self.rollout, "sleep_level", None) == 2
            and self.config.rollout.free_cache_engine
        ):
            per_tensor_base_params = (
                (name, param.to(device, non_blocking=True).full_tensor() if isinstance(param, DTensor) else param)
                for name, param in base_model_params.items()
            )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

verl/workers/fsdp_workers.py

HollowMan6

Thanks for your contribution!

fix: avoid base weight reload when engine does not sleep

90b2f67

wuxibin89 requested a review from HollowMan6 February 2, 2026 06:06

HollowMan6 reviewed Feb 2, 2026

View reviewed changes

verl/workers/fsdp_workers.py Show resolved Hide resolved

[fsdp_workers] Extend avoiding redundant base weight sync when engine…

073a457

… doesn't sleep

JohnConnor123 requested review from ISEEKYAN and vermouth1992 as code owners February 6, 2026 20:50

HollowMan6 changed the title ~~[fsdp_workers] Avoid redundant base weight sync when engine doesn't sleep~~ [fsdp_workers] feat: Avoid redundant base weight sync when engine doesn't sleep Feb 6, 2026

HollowMan6 requested review from Copilot and removed request for ISEEKYAN and vermouth1992 February 6, 2026 23:06

Copilot started reviewing on behalf of HollowMan6 February 6, 2026 23:06 View session

HollowMan6 changed the title ~~[fsdp_workers] feat: Avoid redundant base weight sync when engine doesn't sleep~~ [fsdp] feat: Avoid redundant base weight sync when engine doesn't sleep Feb 6, 2026

HollowMan6 changed the title ~~[fsdp] feat: Avoid redundant base weight sync when engine doesn't sleep~~ [worker] feat: Avoid redundant base weight sync when engine doesn't sleep Feb 6, 2026

Copilot AI reviewed Feb 6, 2026

View reviewed changes

verl/workers/fsdp_workers.py Show resolved Hide resolved

HollowMan6 approved these changes Feb 6, 2026

View reviewed changes

HollowMan6 merged commit 06449b8 into verl-project:main Feb 6, 2026
90 of 96 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[worker] feat: Avoid redundant base weight sync when engine doesn't sleep#5147

[worker] feat: Avoid redundant base weight sync when engine doesn't sleep#5147
HollowMan6 merged 2 commits intoverl-project:mainfrom
JohnConnor123:bugfix/fsdp_sleep_level_reload

JohnConnor123 commented Jan 31, 2026 •

edited

Loading

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

HollowMan6 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JohnConnor123 commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Changes

Perf (Qwen3-1.7B, 20 steps, mean over steps 1-20, tested on 2 rtx3090)

Checklist

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

HollowMan6 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JohnConnor123 commented Jan 31, 2026 •

edited

Loading