Skip to content

[worker] feat: Avoid redundant base weight sync when engine doesn't sleep#5147

Merged
HollowMan6 merged 2 commits intoverl-project:mainfrom
JohnConnor123:bugfix/fsdp_sleep_level_reload
Feb 6, 2026
Merged

[worker] feat: Avoid redundant base weight sync when engine doesn't sleep#5147
HollowMan6 merged 2 commits intoverl-project:mainfrom
JohnConnor123:bugfix/fsdp_sleep_level_reload

Conversation

@JohnConnor123
Copy link
Contributor

@JohnConnor123 JohnConnor123 commented Jan 31, 2026

What does this PR do?

Fixed unnecessary base weight syncing for LoRA sleep_level=2 when rollout.free_cache_engine=False

Impact: up to 26% faster end-to-end training in some cases

When free_cache_engine=False, the rollout engine is not actually sleeping in a way that destroys base weights, so syncing base weights is redundant work.

Changes

  • Only collect send base_model_params when sleep_level==2 and rollout.free_cache_engine=True.

Perf (Qwen3-1.7B, 20 steps, mean over steps 1-20, tested on 2 rtx3090)

metric baseline (ms/token) patched (ms/token) delta (ms/token) delta %
timing_per_token_ms/gen 0.845 0.559 -0.286 -33.9%
timing_per_token_ms/update_actor 0.251 0.252 0.001 +0.48%
timing_per_token_ms/all 1.097 0.811 -0.286 -26.02%
image

Checklist

  • Fix applied
  • Local perf run plot attached
  • Added regression test (not included in this PR)

@wuxibin89 wuxibin89 requested a review from HollowMan6 February 2, 2026 06:06
@HollowMan6 HollowMan6 changed the title [fsdp_workers] Avoid redundant base weight sync when engine doesn't sleep [fsdp_workers] feat: Avoid redundant base weight sync when engine doesn't sleep Feb 6, 2026
@HollowMan6 HollowMan6 requested review from Copilot and removed request for ISEEKYAN and vermouth1992 February 6, 2026 23:06
@HollowMan6 HollowMan6 changed the title [fsdp_workers] feat: Avoid redundant base weight sync when engine doesn't sleep [fsdp] feat: Avoid redundant base weight sync when engine doesn't sleep Feb 6, 2026
@HollowMan6 HollowMan6 changed the title [fsdp] feat: Avoid redundant base weight sync when engine doesn't sleep [worker] feat: Avoid redundant base weight sync when engine doesn't sleep Feb 6, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reduces unnecessary base-weight synchronization work for LoRA setups when the rollout engine is configured not to free/sleep weights (rollout.free_cache_engine=False), improving end-to-end training performance.

Changes:

  • Gate LoRA base-weight sync in Megatron and engine worker paths on rollout.free_cache_engine.
  • In FSDP worker rollout mode, only collect/send base_model_params for LoRA sleep_level==2 when rollout.free_cache_engine=True.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
verl/workers/megatron_workers.py Avoids redundant LoRA base sync when the rollout engine doesn’t free weights.
verl/workers/fsdp_workers.py Skips base model param collection/update for LoRA sleep_level==2 unless the engine actually frees weights.
verl/workers/engine_workers.py Aligns LoRA base sync behavior with free_cache_engine to avoid redundant work.
Comments suppressed due to low confidence (1)

verl/workers/fsdp_workers.py:752

  • device is referenced inside the per_tensor_base_params generator but is only defined in the else branch above (when peft_config is None or self.base_sync_done is false). If peft_config is not None and self.base_sync_done is true, this path will raise a NameError when sleep_level==2 && free_cache_engine (e.g., on subsequent rollout_mode calls). Define device = get_device_id() unconditionally before these generators (or inline inside the generator) so it is always available when converting base_model_params.
        if (
            peft_config is not None
            and getattr(self.rollout, "sleep_level", None) == 2
            and self.config.rollout.free_cache_engine
        ):
            per_tensor_base_params = (
                (name, param.to(device, non_blocking=True).full_tensor() if isinstance(param, DTensor) else param)
                for name, param in base_model_params.items()
            )

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Collaborator

@HollowMan6 HollowMan6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution!

@HollowMan6 HollowMan6 merged commit 06449b8 into verl-project:main Feb 6, 2026
90 of 96 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants