Add trainer token JSONL export#2592
Conversation
c53874c to
c05d797
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit c05d797. Configure here.
| invalid = torch.where(positive_advantages, invalid_high, invalid_low) | ||
| fields["is_masked"] = loss_mask & invalid | ||
| fields["is_masked_high"] = loss_mask & positive_advantages & invalid_high | ||
| fields["is_masked_low"] = loss_mask & negative_advantages & invalid_low |
There was a problem hiding this comment.
OPD export masks mismatch loss
Medium Severity
The token_export module's is_masked* fields for opd training use DefaultLossConfig's DPPO thresholds. This differs from opd_loss_fn's fixed 0.2 probability-delta thresholds, causing exported masking diagnostics to misrepresent actual token masking during opd training.
Reviewed by Cursor Bugbot for commit c05d797. Configure here.
| return output_dir / "token_exports" / f"rank_{rank}.jsonl" | ||
| if path.is_absolute(): | ||
| return path | ||
| return output_dir / path |
There was a problem hiding this comment.
Shared export path corrupts JSONL
Medium Severity
When token_export.path is explicitly set, _resolve_path doesn't include the trainer rank in the filename. This causes multiple TokenExporter instances to concurrently write to the same file, resulting in corrupted JSONL output.
Reviewed by Cursor Bugbot for commit c05d797. Configure here.
* feat: add trainer token jsonl export * chore: use docstrings for token export config


Summary
trainer.experimental.token_export.src/prime_rl/trainer/rl/token_export.py.This PR intentionally does not include the HTML visualizer; PR #2551 is left unchanged for that follow-up work.
Verification
uv run ruff check packages/prime-rl-configs/src/prime_rl/configs/trainer.py skills/configs/SKILL.md src/prime_rl/trainer/batch.py src/prime_rl/trainer/rl/data.py src/prime_rl/trainer/rl/train.py src/prime_rl/trainer/rl/token_export.py src/prime_rl/transport/types.pyuv run ruff format --check packages/prime-rl-configs/src/prime_rl/configs/trainer.py src/prime_rl/trainer/batch.py src/prime_rl/trainer/rl/data.py src/prime_rl/trainer/rl/train.py src/prime_rl/trainer/rl/token_export.py src/prime_rl/transport/types.pyuv run python -m py_compile src/prime_rl/trainer/rl/token_export.py packages/prime-rl-configs/src/prime_rl/configs/trainer.py src/prime_rl/trainer/batch.py src/prime_rl/trainer/rl/data.py src/prime_rl/trainer/rl/train.py src/prime_rl/transport/types.pyuv run pythonJSONL serialization probe forTokenExporterwith two response sequences.uv run ruff check packages/prime-rl-configs/src/prime_rl/configs/trainer.pyuv run ruff format --check packages/prime-rl-configs/src/prime_rl/configs/trainer.pyNote
Medium Risk
Adds new per-step file I/O and threads
rewardsthrough micro-batch preparation/packing/padding and tensorization, which can affect training-loop performance and data-shape assumptions. Functionality is opt-in via config, limiting blast radius when disabled.Overview
Adds an opt-in per-token rollout exporter for the RL trainer (
trainer.experimental.token_export) that writes per-sequence JSONL records with aligned token-level fields (ids/masks, rewards/advantages, entropy, inference vs trainer logprobs, importance ratios, mismatch KL, prob deltas, and DPPO masking diagnostics).To support exports, the PR threads per-token
rewardsthrough the trainer pipeline:TrainingSample→MicroBatch, sample packing/padding, andTensorMicroBatchconversion, and integrates exporter setup/export/close into the RL training loop (disabled by default and skipped on non-primary CP ranks). Documentation is updated inskills/configs/SKILL.mdwith usage and output details.Reviewed by Cursor Bugbot for commit c05d797. Bugbot is set up for automated code reviews on this repo. Configure here.