Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions .claude/skills/verl-upgrade/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
---
name: verl-upgrade
description: "Use when handling veRL version upgrades in Trinity, including three-way merge strategy, boundary checks, and retained customization decisions"
---

# veRL Upgrade Skill

## Primary Sources

1. `docs/agents/verl_upgrade/verl_upgrade_checklist.md`
2. A version-specific migration plan in `docs/agents/verl_upgrade/` matching `verl_*_migration_plan.md`

## Workflow

1. Read `docs/agents/verl_upgrade/verl_upgrade_checklist.md` first.
2. Confirm current version, target version, upgrade scope, and target files.
3. Generate or select the corresponding version-specific migration plan (`verl_*_migration_plan.md`).
4. During execution, review only detailed content for the target upgrade version.
5. Run three-way comparison against `trinity/trainer/verl/build/<version>/` snapshots.
6. Preserve Trinity responsibility boundaries.
7. Keep required Trinity customizations and remove redundant upstream copies.
8. Validate config-to-implementation wiring and output-field contracts.
9. Export remote GPU regression checklist after local static checks.

## Hard Constraints

1. Do not do whole-file overwrite from upstream.
2. Do not reintroduce reward/rollout/validation trainer logic unless responsibilities changed.
3. Keep checkpoint monitor/synchronizer collaboration where required.
17 changes: 17 additions & 0 deletions .codex/AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Codex Repository Guide

## Canonical Documentation Roots

- Agent documentation root: `docs/agents/`
- veRL upgrade knowledge root: `docs/agents/verl_upgrade/`

## Agent Entry Files

- Workspace-level guide: `AGENTS.md`
- Codex-specific guide: `.codex/AGENTS.md`
- Copilot instructions: `.github/instructions/`
- Claude skills: `.claude/skills/`

## Repository Convention

Treat `docs/agents/` as the single source of truth for agent-facing process and navigation documents.
20 changes: 20 additions & 0 deletions .github/instructions/verl-upgrade.instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
---
applyTo: "trinity/trainer/verl/**/*.py,docs/agents/**/*.md"
description: "Use veRL migration guardrails and docs navigation when editing Trinity veRL upgrade related files"
---

# veRL Upgrade Instructions

When the task is related to veRL upgrade/migration in Trinity:

1. Read `docs/agents/verl_upgrade/verl_upgrade_checklist.md` first.
2. Use current version and target version to generate or select a version-specific plan in `docs/agents/verl_upgrade/` following `verl_*_migration_plan.md` naming.
3. During implementation/review execution, focus only on detailed content for the target upgrade version.
4. Preserve Trinity boundaries:
- Do not restore reward/rollout/validation main loops into Trinity trainer path by default.
- Avoid whole-file overwrite from upstream snapshots.
5. Prefer three-way merge reasoning:
- Trinity current vs old upstream baseline
- old upstream vs new upstream
- then current Trinity vs new upstream
6. If a subclass override is identical to upstream parent behavior, prefer removing the override.
10 changes: 5 additions & 5 deletions .github/workflows/docker/docker-compose.yaml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
services:
trinity-node-1:
image: trinity-rft-unittest:20260310
image: trinity-rft-unittest:20260407
cap_add:
- SYS_PTRACE
pull_policy: never
Expand All @@ -15,8 +15,8 @@ services:
- TRINITY_SFT_DATASET_PATH=/mnt/data
- TRINITY_MODEL_PATH=/mnt/models/Qwen3-0.6B
- TRINITY_API_MODEL_PATH=/mnt/models/Qwen3-1.7B
- TRINITY_VLM_MODEL_PATH=/mnt/models/Qwen2.5-VL-3B
- TRINITY_ALTERNATIVE_VLM_MODEL_PATH=/mnt/models/Qwen3-VL-2B-Instruct
- TRINITY_VLM_MODEL_PATH=/mnt/models/Qwen3.5-0.8B
- TRINITY_ALTERNATIVE_VLM_MODEL_PATH=/mnt/models/Qwen3.5-0.8B
- VIRTUAL_ENV=/opt/venv
working_dir: /workspace
networks:
Expand All @@ -34,7 +34,7 @@ services:
capabilities: [gpu]

trinity-node-2:
image: trinity-rft-unittest:20260310
image: trinity-rft-unittest:20260407
cap_add:
- SYS_PTRACE
pull_policy: never
Expand All @@ -44,7 +44,7 @@ services:
- HF_HUB_DISABLE_PROGRESS_BARS=1
- TRINITY_CHECKPOINT_ROOT_DIR=/mnt/checkpoints
- TRINITY_TASKSET_PATH=/mnt/data
- TRINITY_MODEL_PATH=/mnt/models/Qwen3-1.7B
- TRINITY_MODEL_PATH=/mnt/models/Qwen3-0.6B
- VIRTUAL_ENV=/opt/venv
working_dir: /workspace
volumes:
Expand Down
9 changes: 0 additions & 9 deletions .github/workflows/unittest.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -113,15 +113,6 @@ jobs:
fi
fi

- name: Convert report.json time to ms
working-directory: trinity-${{ github.run_id }}
if: env.tests_run == 'true' || failure()
run: |
REPORT=report.json
if [ -f "$REPORT" ]; then
jq '(.results.summary.start, .results.summary.stop) |= (. * 1000)' "$REPORT" > "$REPORT.tmp" && mv "$REPORT.tmp" "$REPORT"
fi

- name: Clean checkpoint dir
working-directory: trinity-${{ github.run_id }}/.github/workflows/docker
if: always()
Expand Down
18 changes: 18 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Multi-Agent Entry Guide

This repository supports multiple coding agents.

## Canonical Knowledge Location

- Agent documentation root: `docs/agents/`
- veRL upgrade docs: `docs/agents/verl_upgrade/`

## Agent-Specific Templates

- Copilot instructions: `.github/instructions/verl-upgrade.instructions.md`
- Claude skill: `.claude/skills/verl-upgrade/SKILL.md`
- Codex template: `.codex/AGENTS.md`

## Shared Rule

All agents should follow this order for veRL upgrades: read checklist first, generate/select a version-specific migration plan from current->target version, then review only target-version detailed content during execution.
9 changes: 9 additions & 0 deletions benchmark/bench.py
Original file line number Diff line number Diff line change
Expand Up @@ -210,6 +210,8 @@ def prepare_configs(args, rank, current_time):
config["synchronizer"]["sync_offset"] = args.sync_offset
if args.sync_style:
config["synchronizer"]["sync_style"] = args.sync_style
if args.trainer_strategy:
config["trainer"]["trainer_strategy"] = args.trainer_strategy

with open(config_path, "w") as f:
yaml.dump(config, f, allow_unicode=True, sort_keys=False)
Expand Down Expand Up @@ -320,5 +322,12 @@ def main(args):
default=None,
choices=[sync_style.value for sync_style in SyncStyle],
)
parser.add_argument(
"--trainer_strategy",
type=str,
default=None,
choices=["fsdp", "fsdp2", "megatron"],
help="Specify the trainer strategy.",
)
args = parser.parse_args()
main(args)
8 changes: 8 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# Trinity-RFT Documentation

## Documentation Layout

- `docs/sphinx_doc/`: Sphinx source and build scripts for API/user docs.
- `docs/agents/`: Agent-oriented operational docs and migration knowledge.
- `docs/agents/verl_upgrade/`: Canonical veRL upgrade checklist and migration plans.

Please use the following commands to build sphinx doc of Trinity-RFT.

```shell
Expand All @@ -20,3 +26,5 @@ cd docs/sphinx_doc
```

The sphinx doc is built in `docs/sphinx_doc/build/html`.

For code-agent workflows (Copilot/Codex/Claude), start from `docs/agents/README.md`.
20 changes: 20 additions & 0 deletions docs/agents/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Agent Knowledge Hub

This directory stores agent-oriented documentation for upgrade workflows, runbooks, and operating constraints.

## Structure

- `verl_upgrade/`: veRL upgrade knowledge, including planning, checklist, and future postmortems.

## How To Use

1. Start from `verl_upgrade/verl_upgrade_checklist.md` before a version upgrade.
2. Use current version and target version to generate or select the corresponding plan in `verl_upgrade/` following `verl_*_migration_plan.md` naming.
3. During execution, review only the detailed content for the target upgrade version.
4. Add new version migration records in `verl_upgrade/` using versioned file names.

## Naming Convention

- Checklist: `verl_upgrade_checklist_<version_or_scope>.md`
- Plan: `verl_<from_version>_to_<to_version>_migration_plan.md`
- Postmortem: `verl_upgrade_postmortem_<date>.md`
128 changes: 128 additions & 0 deletions docs/agents/verl_upgrade/verl_upgrade_checklist.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
# Pre-Upgrade Checklist for veRL

This checklist is for quick verification before the next veRL upgrade in Trinity.

## 1. Confirm Upgrade Scope

1. Confirm the target veRL version.
2. Confirm the current Trinity baseline version.
3. Confirm the upstream snapshots for comparison have been generated under `trinity/trainer/verl/build/<version>/`.
4. Confirm this upgrade still focuses on the same 7 core migration files:
- `fsdp_workers.py`
- `dp_actor.py`
- `fsdp_checkpoint_manager.py`
- `megatron_workers.py`
- `megatron_actor.py`
- `megatron_checkpoint_manager.py`
- `verl_trainer.py` (corresponds to upstream `ray_trainer.py`)

## 2. Prepare Three-Way Comparison

1. For each file, compare all three sources together:
- Current Trinity file
- `build/<old_version>/...`
- `build/<new_version>/...`
2. Do not do whole-file overwrite.
3. Prioritize recording two categories of diffs:
- What Trinity added on top of the old-version baseline
- What upstream changed from old version to new version

## 3. Verify Repository Responsibility Boundaries

Before the next upgrade, verify these boundaries are still valid:

1. Reward computation is not executed in Trinity `verl_trainer.py`.
2. Rollout is not executed in Trinity veRL trainer main loop.
3. Trainer-side validation is currently not implemented.
4. Trinity does not run upstream `RayPPOTrainer.fit()` directly. It follows the path defined in `trinity/trainer/trainer.py`: `prepare()`, `train_step()`, `save_checkpoint()`, `save_state_dict()`, and `upload_state_dict()`.

If any boundary above changes, re-evaluate all following steps in this checklist.

## 4. Upstream Logic That Should Not Be Accidentally Reintroduced

Unless Trinity training responsibilities change, do not migrate these back by default:

1. Full reward pipeline inside `fit()`.
2. Validation main flow.
3. Reward loop / async rollout manager.
4. `CheckpointEngineManager` orchestration logic that is only used by the upstream trainer main loop.

## 5. Must-Check Configuration Wiring

Before upgrade, verify whether these config items still need end-to-end wiring into implementation:

1. `trust_remote_code`
2. `use_prefix_grouper`
3. `calculate_sum_pi_squared`
4. `sum_pi_squared_checkpointing`
5. Compatibility reads for `lora.rank` and `lora_rank`
6. `rollout_correction`
7. Compatibility structure for `reward.reward_model` and `reward_model`

## 6. File-Level Priority Order

Recommended processing order:

1. `dp_actor.py`
2. `fsdp_workers.py`
3. `megatron_actor.py`
4. `megatron_workers.py`
5. `fsdp_checkpoint_manager.py`
6. `megatron_checkpoint_manager.py`
7. `verl_trainer.py`
8. `verl_config.py`

Reason: the first four files define data fields and config wiring; the next three depend on these contracts being stable.

## 7. Convergence Checks Required for Every File

For every migration file, ask:

1. Is this subclass implementation only a copy of parent-class code?
2. If it is fully identical to upstream parent implementation, can we delete the override directly?
3. If this is only a historical workaround, has upstream already absorbed it?
4. If this is a true Trinity-specific responsibility, has the reason to keep it been documented?

## 8. Trinity Customizations Confirmed as Non-Removable

1. Algorithm integration and loss composition logic in `dp_actor.py` and `megatron_actor.py`.
2. `CheckpointMonitor` / `Synchronizer` collaboration logic in `fsdp_checkpoint_manager.py` and `megatron_checkpoint_manager.py`.
3. `CheckpointMonitor`, Trinity custom `train_step()`, and state sync path in `verl_trainer.py`.
4. Trinity's independent experience pipeline and trainer scheduling relationship.

## 9. Known Migration Sensitive Points

1. `use_prefix_grouper` is an end-to-end chain from config to monkey patch to actor/ref worker.
2. `sum_pi_squared` must be passed from actor output all the way to the advantage consumer.
3. Megatron LoRA reference logprob follows the actor/no-adapter path, not the regular ref worker path.
4. To collect MFU in multimodal training, `images_seqlens` must be added to `batch.meta_info` in trainer.
5. Checkpoint manager cannot be replaced by whole-file upstream overwrite, otherwise Trinity async threads and monitoring logic are lost.

## 10. Local Checks After Upgrade

1. Run Problems check for all migrated files.
2. Run `python -m py_compile` uniformly for all migrated files.
3. Verify new config items are closed-loop across dataclass, defaults, and loading path.
4. Verify actor output fields match worker/trainer consumer fields.
5. Verify function signatures for checkpoint save and restore are consistent.

## 11. Minimal Remote GPU Regression

After local checks pass, run at least:

1. FSDP single-step training.
2. Megatron single-step training.
3. Recompute path for old logprob / ref logprob.
4. Megatron reference logprob under LoRA.
5. Checkpoint save and restore.
6. Minimal regression for `use_prefix_grouper`.
7. Minimal regression for `calculate_sum_pi_squared`.

## 12. Final Confirmation

Before submitting the upgrade, reconfirm:

1. Reward, rollout, and validation logic were not accidentally moved back into Trinity trainer.
2. Duplicate subclass implementations that are already identical to upstream were not kept.
3. Features previously trimmed by Trinity were not restored only for version alignment.
4. Documentation has been updated with newly added repository constraints and reasons for retained customizations.
Loading
Loading