agentscope-ai · pan-x-c · Apr 8, 2026 · Apr 3, 2026 · Apr 3, 2026 · Apr 6, 2026
diff --git a/.claude/skills/verl-upgrade/SKILL.md b/.claude/skills/verl-upgrade/SKILL.md
@@ -0,0 +1,29 @@
+---
+name: verl-upgrade
+description: "Use when handling veRL version upgrades in Trinity, including three-way merge strategy, boundary checks, and retained customization decisions"
+---
+
+# veRL Upgrade Skill
+
+## Primary Sources
+
+1. `docs/agents/verl_upgrade/verl_upgrade_checklist.md`
+2. A version-specific migration plan in `docs/agents/verl_upgrade/` matching `verl_*_migration_plan.md`
+
+## Workflow
+
+1. Read `docs/agents/verl_upgrade/verl_upgrade_checklist.md` first.
+2. Confirm current version, target version, upgrade scope, and target files.
+3. Generate or select the corresponding version-specific migration plan (`verl_*_migration_plan.md`).
+4. During execution, review only detailed content for the target upgrade version.
+5. Run three-way comparison against `trinity/trainer/verl/build/<version>/` snapshots.
+6. Preserve Trinity responsibility boundaries.
+7. Keep required Trinity customizations and remove redundant upstream copies.
+8. Validate config-to-implementation wiring and output-field contracts.
+9. Export remote GPU regression checklist after local static checks.
+
+## Hard Constraints
+
+1. Do not do whole-file overwrite from upstream.
+2. Do not reintroduce reward/rollout/validation trainer logic unless responsibilities changed.
+3. Keep checkpoint monitor/synchronizer collaboration where required.
diff --git a/.codex/AGENTS.md b/.codex/AGENTS.md
@@ -0,0 +1,17 @@
+# Codex Repository Guide
+
+## Canonical Documentation Roots
+
+- Agent documentation root: `docs/agents/`
+- veRL upgrade knowledge root: `docs/agents/verl_upgrade/`
+
+## Agent Entry Files
+
+- Workspace-level guide: `AGENTS.md`
+- Codex-specific guide: `.codex/AGENTS.md`
+- Copilot instructions: `.github/instructions/`
+- Claude skills: `.claude/skills/`
+
+## Repository Convention
+
+Treat `docs/agents/` as the single source of truth for agent-facing process and navigation documents.
diff --git a/.github/instructions/verl-upgrade.instructions.md b/.github/instructions/verl-upgrade.instructions.md
@@ -0,0 +1,20 @@
+---
+applyTo: "trinity/trainer/verl/**/*.py,docs/agents/**/*.md"
+description: "Use veRL migration guardrails and docs navigation when editing Trinity veRL upgrade related files"
+---
+
+# veRL Upgrade Instructions
+
+When the task is related to veRL upgrade/migration in Trinity:
+
+1. Read `docs/agents/verl_upgrade/verl_upgrade_checklist.md` first.
+2. Use current version and target version to generate or select a version-specific plan in `docs/agents/verl_upgrade/` following `verl_*_migration_plan.md` naming.
+3. During implementation/review execution, focus only on detailed content for the target upgrade version.
+4. Preserve Trinity boundaries:
+   - Do not restore reward/rollout/validation main loops into Trinity trainer path by default.
+   - Avoid whole-file overwrite from upstream snapshots.
+5. Prefer three-way merge reasoning:
+   - Trinity current vs old upstream baseline
+   - old upstream vs new upstream
+   - then current Trinity vs new upstream
+6. If a subclass override is identical to upstream parent behavior, prefer removing the override.
diff --git a/.github/workflows/docker/docker-compose.yaml b/.github/workflows/docker/docker-compose.yaml
@@ -1,6 +1,6 @@
 services:
   trinity-node-1:
-    image: trinity-rft-unittest:20260310
+    image: trinity-rft-unittest:20260407
     cap_add:
       - SYS_PTRACE
     pull_policy: never
@@ -15,8 +15,8 @@ services:
       - TRINITY_SFT_DATASET_PATH=/mnt/data
       - TRINITY_MODEL_PATH=/mnt/models/Qwen3-0.6B
       - TRINITY_API_MODEL_PATH=/mnt/models/Qwen3-1.7B
-      - TRINITY_VLM_MODEL_PATH=/mnt/models/Qwen2.5-VL-3B
-      - TRINITY_ALTERNATIVE_VLM_MODEL_PATH=/mnt/models/Qwen3-VL-2B-Instruct
+      - TRINITY_VLM_MODEL_PATH=/mnt/models/Qwen3.5-0.8B
+      - TRINITY_ALTERNATIVE_VLM_MODEL_PATH=/mnt/models/Qwen3.5-0.8B
       - VIRTUAL_ENV=/opt/venv
     working_dir: /workspace
     networks:
@@ -34,7 +34,7 @@ services:
             capabilities: [gpu]
 
   trinity-node-2:
-    image: trinity-rft-unittest:20260310
+    image: trinity-rft-unittest:20260407
     cap_add:
       - SYS_PTRACE
     pull_policy: never
@@ -44,7 +44,7 @@ services:
       - HF_HUB_DISABLE_PROGRESS_BARS=1
       - TRINITY_CHECKPOINT_ROOT_DIR=/mnt/checkpoints
       - TRINITY_TASKSET_PATH=/mnt/data
-      - TRINITY_MODEL_PATH=/mnt/models/Qwen3-1.7B
+      - TRINITY_MODEL_PATH=/mnt/models/Qwen3-0.6B
       - VIRTUAL_ENV=/opt/venv
     working_dir: /workspace
     volumes:

diff --git a/.github/workflows/unittest.yaml b/.github/workflows/unittest.yaml
@@ -113,15 +113,6 @@ jobs:
           fi
         fi
 
-    - name: Convert report.json time to ms
-      working-directory: trinity-${{ github.run_id }}
-      if: env.tests_run == 'true' || failure()
-      run: |
-        REPORT=report.json
-        if [ -f "$REPORT" ]; then
-          jq '(.results.summary.start, .results.summary.stop) |= (. * 1000)' "$REPORT" > "$REPORT.tmp" && mv "$REPORT.tmp" "$REPORT"
-        fi
-
     - name: Clean checkpoint dir
       working-directory: trinity-${{ github.run_id }}/.github/workflows/docker
       if: always()

diff --git a/AGENTS.md b/AGENTS.md
@@ -0,0 +1,18 @@
+# Multi-Agent Entry Guide
+
+This repository supports multiple coding agents.
+
+## Canonical Knowledge Location
+
+- Agent documentation root: `docs/agents/`
+- veRL upgrade docs: `docs/agents/verl_upgrade/`
+
+## Agent-Specific Templates
+
+- Copilot instructions: `.github/instructions/verl-upgrade.instructions.md`
+- Claude skill: `.claude/skills/verl-upgrade/SKILL.md`
+- Codex template: `.codex/AGENTS.md`
+
+## Shared Rule
+
+All agents should follow this order for veRL upgrades: read checklist first, generate/select a version-specific migration plan from current->target version, then review only target-version detailed content during execution.
diff --git a/benchmark/bench.py b/benchmark/bench.py
@@ -210,6 +210,8 @@ def prepare_configs(args, rank, current_time):
             config["synchronizer"]["sync_offset"] = args.sync_offset
         if args.sync_style:
             config["synchronizer"]["sync_style"] = args.sync_style
+        if args.trainer_strategy:
+            config["trainer"]["trainer_strategy"] = args.trainer_strategy
 
         with open(config_path, "w") as f:
             yaml.dump(config, f, allow_unicode=True, sort_keys=False)
@@ -320,5 +322,12 @@ def main(args):
         default=None,
         choices=[sync_style.value for sync_style in SyncStyle],
     )
+    parser.add_argument(
+        "--trainer_strategy",
+        type=str,
+        default=None,
+        choices=["fsdp", "fsdp2", "megatron"],
+        help="Specify the trainer strategy.",
+    )
     args = parser.parse_args()
     main(args)
diff --git a/docs/README.md b/docs/README.md
@@ -1,5 +1,11 @@
 # Trinity-RFT Documentation
 
+## Documentation Layout
+
+- `docs/sphinx_doc/`: Sphinx source and build scripts for API/user docs.
+- `docs/agents/`: Agent-oriented operational docs and migration knowledge.
+- `docs/agents/verl_upgrade/`: Canonical veRL upgrade checklist and migration plans.
+
 Please use the following commands to build sphinx doc of Trinity-RFT.
 
 ```shell
@@ -20,3 +26,5 @@ cd docs/sphinx_doc
 ```
 
 The sphinx doc is built in `docs/sphinx_doc/build/html`.
+
+For code-agent workflows (Copilot/Codex/Claude), start from `docs/agents/README.md`.
diff --git a/docs/agents/README.md b/docs/agents/README.md
@@ -0,0 +1,20 @@
+# Agent Knowledge Hub
+
+This directory stores agent-oriented documentation for upgrade workflows, runbooks, and operating constraints.
+
+## Structure
+
+- `verl_upgrade/`: veRL upgrade knowledge, including planning, checklist, and future postmortems.
+
+## How To Use
+
+1. Start from `verl_upgrade/verl_upgrade_checklist.md` before a version upgrade.
+2. Use current version and target version to generate or select the corresponding plan in `verl_upgrade/` following `verl_*_migration_plan.md` naming.
+3. During execution, review only the detailed content for the target upgrade version.
+4. Add new version migration records in `verl_upgrade/` using versioned file names.
+
+## Naming Convention
+
+- Checklist: `verl_upgrade_checklist_<version_or_scope>.md`
+- Plan: `verl_<from_version>_to_<to_version>_migration_plan.md`
+- Postmortem: `verl_upgrade_postmortem_<date>.md`
diff --git a/docs/agents/verl_upgrade/verl_upgrade_checklist.md b/docs/agents/verl_upgrade/verl_upgrade_checklist.md
@@ -0,0 +1,128 @@
+# Pre-Upgrade Checklist for veRL
+
+This checklist is for quick verification before the next veRL upgrade in Trinity.
+
+## 1. Confirm Upgrade Scope
+
+1. Confirm the target veRL version.
+2. Confirm the current Trinity baseline version.
+3. Confirm the upstream snapshots for comparison have been generated under `trinity/trainer/verl/build/<version>/`.
+4. Confirm this upgrade still focuses on the same 7 core migration files:
+   - `fsdp_workers.py`
+   - `dp_actor.py`
+   - `fsdp_checkpoint_manager.py`
+   - `megatron_workers.py`
+   - `megatron_actor.py`
+   - `megatron_checkpoint_manager.py`
+   - `verl_trainer.py` (corresponds to upstream `ray_trainer.py`)
+
+## 2. Prepare Three-Way Comparison
+
+1. For each file, compare all three sources together:
+   - Current Trinity file
+   - `build/<old_version>/...`
+   - `build/<new_version>/...`
+2. Do not do whole-file overwrite.
+3. Prioritize recording two categories of diffs:
+   - What Trinity added on top of the old-version baseline
+   - What upstream changed from old version to new version
+
+## 3. Verify Repository Responsibility Boundaries
+
+Before the next upgrade, verify these boundaries are still valid:
+
+1. Reward computation is not executed in Trinity `verl_trainer.py`.
+2. Rollout is not executed in Trinity veRL trainer main loop.
+3. Trainer-side validation is currently not implemented.
+4. Trinity does not run upstream `RayPPOTrainer.fit()` directly. It follows the path defined in `trinity/trainer/trainer.py`: `prepare()`, `train_step()`, `save_checkpoint()`, `save_state_dict()`, and `upload_state_dict()`.
+
+If any boundary above changes, re-evaluate all following steps in this checklist.
+
+## 4. Upstream Logic That Should Not Be Accidentally Reintroduced
+
+Unless Trinity training responsibilities change, do not migrate these back by default:
+
+1. Full reward pipeline inside `fit()`.
+2. Validation main flow.
+3. Reward loop / async rollout manager.
+4. `CheckpointEngineManager` orchestration logic that is only used by the upstream trainer main loop.
+
+## 5. Must-Check Configuration Wiring
+
+Before upgrade, verify whether these config items still need end-to-end wiring into implementation:
+
+1. `trust_remote_code`
+2. `use_prefix_grouper`
+3. `calculate_sum_pi_squared`
+4. `sum_pi_squared_checkpointing`
+5. Compatibility reads for `lora.rank` and `lora_rank`
+6. `rollout_correction`
+7. Compatibility structure for `reward.reward_model` and `reward_model`
+
+## 6. File-Level Priority Order
+
+Recommended processing order:
+
+1. `dp_actor.py`
+2. `fsdp_workers.py`
+3. `megatron_actor.py`
+4. `megatron_workers.py`
+5. `fsdp_checkpoint_manager.py`
+6. `megatron_checkpoint_manager.py`
+7. `verl_trainer.py`
+8. `verl_config.py`
+
+Reason: the first four files define data fields and config wiring; the next three depend on these contracts being stable.
+
+## 7. Convergence Checks Required for Every File
+
+For every migration file, ask:
+
+1. Is this subclass implementation only a copy of parent-class code?
+2. If it is fully identical to upstream parent implementation, can we delete the override directly?
+3. If this is only a historical workaround, has upstream already absorbed it?
+4. If this is a true Trinity-specific responsibility, has the reason to keep it been documented?
+
+## 8. Trinity Customizations Confirmed as Non-Removable
+
+1. Algorithm integration and loss composition logic in `dp_actor.py` and `megatron_actor.py`.
+2. `CheckpointMonitor` / `Synchronizer` collaboration logic in `fsdp_checkpoint_manager.py` and `megatron_checkpoint_manager.py`.
+3. `CheckpointMonitor`, Trinity custom `train_step()`, and state sync path in `verl_trainer.py`.
+4. Trinity's independent experience pipeline and trainer scheduling relationship.
+
+## 9. Known Migration Sensitive Points
+
+1. `use_prefix_grouper` is an end-to-end chain from config to monkey patch to actor/ref worker.
+2. `sum_pi_squared` must be passed from actor output all the way to the advantage consumer.
+3. Megatron LoRA reference logprob follows the actor/no-adapter path, not the regular ref worker path.
+4. To collect MFU in multimodal training, `images_seqlens` must be added to `batch.meta_info` in trainer.
+5. Checkpoint manager cannot be replaced by whole-file upstream overwrite, otherwise Trinity async threads and monitoring logic are lost.
+
+## 10. Local Checks After Upgrade
+
+1. Run Problems check for all migrated files.
+2. Run `python -m py_compile` uniformly for all migrated files.
+3. Verify new config items are closed-loop across dataclass, defaults, and loading path.
+4. Verify actor output fields match worker/trainer consumer fields.
+5. Verify function signatures for checkpoint save and restore are consistent.
+
+## 11. Minimal Remote GPU Regression
+
+After local checks pass, run at least:
+
+1. FSDP single-step training.
+2. Megatron single-step training.
+3. Recompute path for old logprob / ref logprob.
+4. Megatron reference logprob under LoRA.
+5. Checkpoint save and restore.
+6. Minimal regression for `use_prefix_grouper`.
+7. Minimal regression for `calculate_sum_pi_squared`.
+
+## 12. Final Confirmation
+
+Before submitting the upgrade, reconfirm:
+
+1. Reward, rollout, and validation logic were not accidentally moved back into Trinity trainer.
+2. Duplicate subclass implementations that are already identical to upstream were not kept.
+3. Features previously trimmed by Trinity were not restored only for version alignment.
+4. Documentation has been updated with newly added repository constraints and reasons for retained customizations.