Add trainer token JSONL export by samsja · Pull Request #2592 · PrimeIntellect-ai/prime-rl

samsja · 2026-05-22T01:40:35Z

Summary

Add an opt-in RL trainer token exporter under trainer.experimental.token_export.
Export one JSONL record per sequence with aligned per-token arrays for ids, masks, rewards, advantages, entropy, KL/logprob/ratio data, probability deltas, and DPPO mask diagnostics.
Keep the trainer integration to setup/export/close and leave token-export math in src/prime_rl/trainer/rl/token_export.py.
Thread rollout rewards through trainer micro-batches so they can be included in exports.
Document the JSONL-only workflow in the config skill.
Use config attribute docstrings for the token export settings.

This PR intentionally does not include the HTML visualizer; PR #2551 is left unchanged for that follow-up work.

Verification

uv run ruff check packages/prime-rl-configs/src/prime_rl/configs/trainer.py skills/configs/SKILL.md src/prime_rl/trainer/batch.py src/prime_rl/trainer/rl/data.py src/prime_rl/trainer/rl/train.py src/prime_rl/trainer/rl/token_export.py src/prime_rl/transport/types.py
uv run ruff format --check packages/prime-rl-configs/src/prime_rl/configs/trainer.py src/prime_rl/trainer/batch.py src/prime_rl/trainer/rl/data.py src/prime_rl/trainer/rl/train.py src/prime_rl/trainer/rl/token_export.py src/prime_rl/transport/types.py
uv run python -m py_compile src/prime_rl/trainer/rl/token_export.py packages/prime-rl-configs/src/prime_rl/configs/trainer.py src/prime_rl/trainer/batch.py src/prime_rl/trainer/rl/data.py src/prime_rl/trainer/rl/train.py src/prime_rl/transport/types.py
Inline uv run python JSONL serialization probe for TokenExporter with two response sequences.
Latest Mika follow-up: uv run ruff check packages/prime-rl-configs/src/prime_rl/configs/trainer.py
Latest Mika follow-up: uv run ruff format --check packages/prime-rl-configs/src/prime_rl/configs/trainer.py

Note

Medium Risk
Adds new per-step file I/O and threads rewards through micro-batch preparation/packing/padding and tensorization, which can affect training-loop performance and data-shape assumptions. Functionality is opt-in via config, limiting blast radius when disabled.

Overview
Adds an opt-in per-token rollout exporter for the RL trainer (trainer.experimental.token_export) that writes per-sequence JSONL records with aligned token-level fields (ids/masks, rewards/advantages, entropy, inference vs trainer logprobs, importance ratios, mismatch KL, prob deltas, and DPPO masking diagnostics).

To support exports, the PR threads per-token rewards through the trainer pipeline: TrainingSample → MicroBatch, sample packing/padding, and TensorMicroBatch conversion, and integrates exporter setup/export/close into the RL training loop (disabled by default and skipped on non-primary CP ranks). Documentation is updated in skills/configs/SKILL.md with usage and output details.

^{Reviewed by Cursor Bugbot for commit c05d797. Bugbot is set up for automated code reviews on this repo. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit c05d797. Configure here.}

cursor · 2026-05-22T02:50:27Z

+            invalid = torch.where(positive_advantages, invalid_high, invalid_low)
+            fields["is_masked"] = loss_mask & invalid
+            fields["is_masked_high"] = loss_mask & positive_advantages & invalid_high
+            fields["is_masked_low"] = loss_mask & negative_advantages & invalid_low


OPD export masks mismatch loss

Medium Severity

The token_export module's is_masked* fields for opd training use DefaultLossConfig's DPPO thresholds. This differs from opd_loss_fn's fixed 0.2 probability-delta thresholds, causing exported masking diagnostics to misrepresent actual token masking during opd training.

^{Reviewed by Cursor Bugbot for commit c05d797. Configure here.}

cursor · 2026-05-22T02:50:27Z

+            return output_dir / "token_exports" / f"rank_{rank}.jsonl"
+        if path.is_absolute():
+            return path
+        return output_dir / path


Shared export path corrupts JSONL

Medium Severity

When token_export.path is explicitly set, _resolve_path doesn't include the trainer rank in the filename. This causes multiple TokenExporter instances to concurrently write to the same file, resulting in corrupted JSONL output.

^{Reviewed by Cursor Bugbot for commit c05d797. Configure here.}

* feat: add trainer token jsonl export * chore: use docstrings for token export config

feat: add trainer token jsonl export

c05d797

samsja force-pushed the feat/token-export-jsonl branch from c53874c to c05d797 Compare May 22, 2026 01:41

samsja marked this pull request as ready for review May 22, 2026 02:46

cursor Bot reviewed May 22, 2026

View reviewed changes

mikasenghaas reviewed May 22, 2026

View reviewed changes

Comment thread packages/prime-rl-configs/src/prime_rl/configs/trainer.py Outdated

chore: use docstrings for token export config

b8e2d45

mikasenghaas approved these changes May 22, 2026

View reviewed changes

samsja merged commit e971d90 into main May 22, 2026
16 of 18 checks passed

samsja added a commit that referenced this pull request May 22, 2026

Add trainer token JSONL export (#2592)

1d0d7b0

* feat: add trainer token jsonl export * chore: use docstrings for token export config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add trainer token JSONL export#2592

Add trainer token JSONL export#2592
samsja merged 2 commits into
mainfrom
feat/token-export-jsonl

samsja commented May 22, 2026 •

edited

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot May 22, 2026

Uh oh!

cursor Bot May 22, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

samsja commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Verification

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 22, 2026

Choose a reason for hiding this comment

OPD export masks mismatch loss

Uh oh!

cursor Bot May 22, 2026

Choose a reason for hiding this comment

Shared export path corrupts JSONL

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

samsja commented May 22, 2026 •

edited

Loading