why are there multiple settings for `actor_rollout_ref.model.enable_gradient_checkpointing`? Is this a deliberate design choice? by khazic · Pull Request #4263 · verl-project/verl

khazic · 2025-11-24T08:42:45Z

What does this PR do?

Add concise overview of what this PR aims to achieve or accomplish. Reference related GitHub issues and PRs that help with the review.

Checklist Before Starting

[✅] Search for similar PRs. Paste at least one query link here: ...
[✅] Format the PR title as [{modules}] {type}: {description} (This will be checked by the CI)
- {modules} include fsdp, megatron, sglang, vllm, rollout, trainer, ci, training_utils, recipe, hardware, deployment, ray, worker, single_controller, misc, perf, model, algo, env, tool, ckpt, doc, data
- If this PR involves multiple modules, separate them with , like [megatron, fsdp, doc]
- {type} is in feat, fix, refactor, chore, test
- If this PR breaks any API (CLI arguments, config, function signature, etc.), add [BREAKING] to the beginning of the title.
- Example: [BREAKING][fsdp, megatron] feat: dynamic batching

Test

For changes that can not be tested by CI (e.g., algorithm implementation, new model support), validate by experiment(s) and show results like training curve plots, evaluation results, etc.

API and Usage Example

Demonstrate how the API changes if any, and provide usage example(s) if possible.

# Add code snippet or script demonstrating how to use this

Design & Code Changes

Demonstrate the high-level design if this PR is complex, and list the specific changes.

Checklist Before Submitting

Important

Please check all the following items before requesting a review, otherwise the reviewer might deprioritize this PR for review.

[✅] Read the Contribute Guide.
[✅] Apply pre-commit checks: pre-commit install && pre-commit run --all-files --show-diff-on-failure --color=always
[✅ ] Add / Update the documentation.
[✅] Add unit or end-to-end test(s) to the CI workflow to cover all the code. If not feasible, explain why: ...
[✅] Once your PR is ready for CI, send a message in the ci-request channel in the verl Slack workspace. (If not accessible, please try the Feishu group (飞书群).)

gemini-code-assist

Code Review

This pull request removes a conflicting and duplicate configuration for actor_rollout_ref.model.enable_gradient_checkpointing in the run_qwen2.5-32b.sh script. This improves the clarity of the configuration. However, as a side effect, this change disables gradient checkpointing for the actor model. I've added a comment highlighting the potential for this to cause out-of-memory errors, as this is a significant change for a large 32B parameter model.

examples/ppo_trainer/run_qwen2.5-32b.sh

CLAassistant · 2026-02-02T11:14:38Z

All committers have signed the CLA.

- add FSDP GRPO launcher with vLLM rollout settings - update Megatron launcher to keep workers running and log to W&B - increase Megatron NCCL timeout to 1200s - log validation generations by default in PPO trainer - remove legacy GRPO DLC script

- add single-node 8xGPU Megatron GRPO script with TP/PP=1 - tune batch sizes and validation defaults for single-node runs - update existing GRPO launch scripts to match latest paths/settings

- set WANDB_MODE=offline in single-node Megatron script - avoid proxy failures during W&B logging

- reduce batch sizes and sequence lengths for Megatron single-node - align FSDP single-node script with safer rollout settings - keep vLLM utilization low for constrained free memory

- raise vLLM gpu_memory_utilization to 0.30 for KV cache - lower rollout.n and cap max batched tokens for stability - apply settings to both Megatron and FSDP single-node scripts

gemini-code-assist bot reviewed Nov 24, 2025

View reviewed changes

examples/ppo_trainer/run_qwen2.5-32b.sh Show resolved Hide resolved

khazic requested review from PeterSH6, eric-haibin-lin, tongyx361 and vermouth1992 as code owners January 9, 2026 07:16

chore: point recipe submodule to fork

32ff542

khazic force-pushed the main branch from 6f6aeb3 to 32ff542 Compare February 2, 2026 10:57

feat: add custom Qwen3-30BA3B translate recipe

27e354b

khazic added 21 commits February 3, 2026 14:24

Add RLVR_ABCDE_dense scripts

3b2a456

Merge branch 'verl-project:main' into main

d48f175

Merge branch 'verl-project:main' into main

d0db924

feat: add single-node Megatron GRPO launcher

3c3288c

- add single-node 8xGPU Megatron GRPO script with TP/PP=1 - tune batch sizes and validation defaults for single-node runs - update existing GRPO launch scripts to match latest paths/settings

chore: run single-node GRPO in W&B offline mode

c79bebe

- set WANDB_MODE=offline in single-node Megatron script - avoid proxy failures during W&B logging

chore: lower single-node GRPO memory footprint

56ba579

- reduce batch sizes and sequence lengths for Megatron single-node - align FSDP single-node script with safer rollout settings - keep vLLM utilization low for constrained free memory

chore: tune vLLM rollout memory for single-node

8e8deed

- raise vLLM gpu_memory_utilization to 0.30 for KV cache - lower rollout.n and cap max batched tokens for stability - apply settings to both Megatron and FSDP single-node scripts

Update GRPO scripts for 4-node Ray

cfafe22

Use Ray address for existing cluster

4b005cb

Add Ray runtime_env for code import

787a9eb

Set socket IFNAME and increase batch size

4f360e1

Propagate env to Ray workers and adjust batch

5acfc87

Quote MASTER_PORT in Ray runtime env

d41a157

Add WANDB proxy env vars to RLVR scripts

6fa835f

Remove WANDB proxy envs and keep offline mode

069746f

Enable WANDB logging via proxy in RLVR scripts

afa0f41

Increase max prompt length to 2048

042c4c8

Merge branch 'verl-project:main' into main

10df219

Tune RLVR GRPO configs for LR decay and larger rollout batches

232e77a

Align FSDP and Megatron rollout settings

2e10ab5

khazic added 30 commits February 10, 2026 15:52

Fix Ray env var types for master port

7ab6ed6

Update RLVR launch scripts

fd018b6

Set explicit Ray address for FSDP launch

52fc39a

Update FSDP Ray head address

9b26af5

Shorten Ray temp and working dir paths

11a2cd1

Avoid Ray working_dir packaging to shorten IPC paths

34ad838

Use user-owned short paths for Ray temp and work dirs

b25034e

Move Ray temp and TMPDIR to /dev/shm

fff6f09

Pass TMPDIR to Ray runtime env

568690f

Set WANDB_DIR to shared path for Ray workers

d736fa3

Disable Gloo IPv6 in RLVR launch scripts

a7d7460

Ensure GLOO_IPV6 is passed as string

1e9e40b

Quote GLOO_IPV6 for Ray runtime env

fb012a9

Fix FSDP optimizer overrides

23098c0

Fix Hydra overrides for FSDP optimizer

603824d

Pass WANDB_API_KEY to Ray runtime env

b8fba05

Add JSON-to-parquet converter for VERL SFT

253fe3f

Tune FSDP rollout weight-sync bucket

4b96b3b

Propagate proxy and tmp dirs to Ray env

2be47a4

Fix SFT Megatron lr scheduler steps

093ed14

Add NO_PROXY for internal traffic

cba9e5e

Quote NO_PROXY for Hydra overrides

3cee17d

Force proxy env vars for Ray workers

70616b2

recipes: drop ALL_PROXY from GRPO scripts

2cc92e9

debug

2105fb4

Merge branch 'verl-project:main' into main

7e5931c

recipes: disable proxy and use wandb offline for GRPO

86c5529

k

8908e25

recipes: set FSDP MASTER_ADDR default

435467f

Merge branch 'verl-project:main' into main

179feec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why are there multiple settings for `actor_rollout_ref.model.enable_gradient_checkpointing`? Is this a deliberate design choice?#4263

why are there multiple settings for `actor_rollout_ref.model.enable_gradient_checkpointing`? Is this a deliberate design choice?#4263
khazic wants to merge 64 commits intoverl-project:mainfrom
khazic:main

khazic commented Nov 24, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

CLAassistant commented Feb 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

khazic commented Nov 24, 2025

What does this PR do?

Checklist Before Starting

Test

API and Usage Example

Design & Code Changes

Checklist Before Submitting

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

CLAassistant commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CLAassistant commented Feb 2, 2026 •

edited

Loading