Skip to content

Add trainer token JSONL export#2592

Merged
samsja merged 2 commits into
mainfrom
feat/token-export-jsonl
May 22, 2026
Merged

Add trainer token JSONL export#2592
samsja merged 2 commits into
mainfrom
feat/token-export-jsonl

Conversation

@samsja
Copy link
Copy Markdown
Member

@samsja samsja commented May 22, 2026

Summary

  • Add an opt-in RL trainer token exporter under trainer.experimental.token_export.
  • Export one JSONL record per sequence with aligned per-token arrays for ids, masks, rewards, advantages, entropy, KL/logprob/ratio data, probability deltas, and DPPO mask diagnostics.
  • Keep the trainer integration to setup/export/close and leave token-export math in src/prime_rl/trainer/rl/token_export.py.
  • Thread rollout rewards through trainer micro-batches so they can be included in exports.
  • Document the JSONL-only workflow in the config skill.
  • Use config attribute docstrings for the token export settings.

This PR intentionally does not include the HTML visualizer; PR #2551 is left unchanged for that follow-up work.

Verification

  • uv run ruff check packages/prime-rl-configs/src/prime_rl/configs/trainer.py skills/configs/SKILL.md src/prime_rl/trainer/batch.py src/prime_rl/trainer/rl/data.py src/prime_rl/trainer/rl/train.py src/prime_rl/trainer/rl/token_export.py src/prime_rl/transport/types.py
  • uv run ruff format --check packages/prime-rl-configs/src/prime_rl/configs/trainer.py src/prime_rl/trainer/batch.py src/prime_rl/trainer/rl/data.py src/prime_rl/trainer/rl/train.py src/prime_rl/trainer/rl/token_export.py src/prime_rl/transport/types.py
  • uv run python -m py_compile src/prime_rl/trainer/rl/token_export.py packages/prime-rl-configs/src/prime_rl/configs/trainer.py src/prime_rl/trainer/batch.py src/prime_rl/trainer/rl/data.py src/prime_rl/trainer/rl/train.py src/prime_rl/transport/types.py
  • Inline uv run python JSONL serialization probe for TokenExporter with two response sequences.
  • Latest Mika follow-up: uv run ruff check packages/prime-rl-configs/src/prime_rl/configs/trainer.py
  • Latest Mika follow-up: uv run ruff format --check packages/prime-rl-configs/src/prime_rl/configs/trainer.py

Note

Medium Risk
Adds new per-step file I/O and threads rewards through micro-batch preparation/packing/padding and tensorization, which can affect training-loop performance and data-shape assumptions. Functionality is opt-in via config, limiting blast radius when disabled.

Overview
Adds an opt-in per-token rollout exporter for the RL trainer (trainer.experimental.token_export) that writes per-sequence JSONL records with aligned token-level fields (ids/masks, rewards/advantages, entropy, inference vs trainer logprobs, importance ratios, mismatch KL, prob deltas, and DPPO masking diagnostics).

To support exports, the PR threads per-token rewards through the trainer pipeline: TrainingSampleMicroBatch, sample packing/padding, and TensorMicroBatch conversion, and integrates exporter setup/export/close into the RL training loop (disabled by default and skipped on non-primary CP ranks). Documentation is updated in skills/configs/SKILL.md with usage and output details.

Reviewed by Cursor Bugbot for commit c05d797. Bugbot is set up for automated code reviews on this repo. Configure here.

@samsja samsja force-pushed the feat/token-export-jsonl branch from c53874c to c05d797 Compare May 22, 2026 01:41
@samsja samsja marked this pull request as ready for review May 22, 2026 02:46
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit c05d797. Configure here.

invalid = torch.where(positive_advantages, invalid_high, invalid_low)
fields["is_masked"] = loss_mask & invalid
fields["is_masked_high"] = loss_mask & positive_advantages & invalid_high
fields["is_masked_low"] = loss_mask & negative_advantages & invalid_low
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OPD export masks mismatch loss

Medium Severity

The token_export module's is_masked* fields for opd training use DefaultLossConfig's DPPO thresholds. This differs from opd_loss_fn's fixed 0.2 probability-delta thresholds, causing exported masking diagnostics to misrepresent actual token masking during opd training.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit c05d797. Configure here.

return output_dir / "token_exports" / f"rank_{rank}.jsonl"
if path.is_absolute():
return path
return output_dir / path
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shared export path corrupts JSONL

Medium Severity

When token_export.path is explicitly set, _resolve_path doesn't include the trainer rank in the filename. This causes multiple TokenExporter instances to concurrently write to the same file, resulting in corrupted JSONL output.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit c05d797. Configure here.

Comment thread packages/prime-rl-configs/src/prime_rl/configs/trainer.py Outdated
@samsja samsja merged commit e971d90 into main May 22, 2026
16 of 18 checks passed
samsja added a commit that referenced this pull request May 22, 2026
* feat: add trainer token jsonl export

* chore: use docstrings for token export config
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants