Add token export visualizer by samsja · Pull Request #2551 · PrimeIntellect-ai/prime-rl

samsja · 2026-05-19T00:54:47Z

Summary

Add an opt-in trainer token exporter that writes per-sequence JSONL with token ids, reward, advantage, entropy, KL/probability/logprob/ratio data, and DPPO mask flags.
Keep the trainer hook narrow: setup/export/close live behind src/prime_rl/trainer/rl/token_export.py, and exporter-specific ratio/mask math is scoped there instead of inline in the trainer loop.
Simplify the token exporter by writing directly to a JSONL file handle instead of routing through a background queue/thread.
When token export is enabled, export every step from each exporting rank; no interval/cap option.
Thread per-sequence rewards through trainer batch structures so rollout rewards are available in the export.
Add a minimal HTML visualizer that decodes token ids after export, groups chat-template roles, colors high KL mismatch tokens, and shows token metadata on hover.
Extend the visualizer with an all-records navigator: directory inputs collect *.jsonl exports into one static HTML page, and single JSONL files can use --all-records to embed every matching record.
Reuse one tokenizer instance per visualizer render so directory/all-records mode does not reload it for each record.
Document the exporter and visualizer workflow in the config skill.

Verification

uv run ruff check packages/prime-rl-configs/src/prime_rl/configs/trainer.py src/prime_rl/transport/types.py src/prime_rl/trainer/batch.py src/prime_rl/trainer/rl/data.py src/prime_rl/trainer/rl/train.py src/prime_rl/trainer/rl/token_export.py scripts/token_export_visualizer.py
uv run ruff check src/prime_rl/trainer/rl/train.py src/prime_rl/trainer/rl/token_export.py scripts/token_export_visualizer.py
uv run ruff format --check src/prime_rl/trainer/rl/train.py src/prime_rl/trainer/rl/token_export.py scripts/token_export_visualizer.py
uv run ruff check packages/prime-rl-configs/src/prime_rl/configs/trainer.py src/prime_rl/trainer/rl/token_export.py
uv run ruff format --check packages/prime-rl-configs/src/prime_rl/configs/trainer.py src/prime_rl/trainer/rl/token_export.py
uv run python -m py_compile src/prime_rl/trainer/rl/token_export.py packages/prime-rl-configs/src/prime_rl/configs/trainer.py
uv run ruff check scripts/token_export_visualizer.py
uv run ruff format --check scripts/token_export_visualizer.py
uv run rl @ configs/ci/integration/reverse_text/start.toml --output-dir /tmp/prime-rl-token-export-clean-run --clean-output-dir --max-steps 1 --seq-len 512 --orchestrator.batch-size 2 --orchestrator.rollouts-per-example 2 --orchestrator.train.sampling.max-completion-tokens 16 --trainer.experimental.token-export
uv run scripts/token_export_visualizer.py /tmp/prime-rl-token-export-clean-run/token_exports --tokenizer PrimeIntellect/Qwen3-0.6B-Reverse-Text-SFT -o output.html
uv run scripts/token_export_visualizer.py /tmp/prime-rl-token-export-clean-run/token_exports --tokenizer PrimeIntellect/Qwen3-0.6B-Reverse-Text-SFT -o /tmp/prime-rl-token-export-clean-run/output.html
Confirmed /tmp/prime-rl-token-export-clean-run/token_exports/rank_0.jsonl has 2 records and no is_clipped field.

No pytest tests were added or run, per request.

Note

Medium Risk
Touches the RL training loop and batch serialization by adding a new rewards field and export hook; while gated behind experimental.token_export, incorrect alignment/IO failures could impact training stability or performance when enabled.

Overview
Adds opt-in per-token rollout export for the RL trainer via trainer.experimental.token_export, writing per-sequence JSONL records (token ids plus aligned metrics like reward/advantage/entropy, mismatch KL, importance ratios, and DPPO mask diagnostics) using an async writer (trainer/rl/token_export.py).

Threads rewards through the trainer data path (TrainingSample->MicroBatch->TensorMicroBatch), updates packing/padding logic to keep reward arrays aligned, and hooks exporter setup/export/close into trainer/rl/train.py.

Introduces scripts/token_export_visualizer.py, a standalone CLI that loads JSONL (file or directory), optionally decodes tokens with a tokenizer, and renders a static navigable HTML view with KL-based highlighting and hoverable token details; documents the workflow in skills/config/SKILL.md.

^{Reviewed by Cursor Bugbot for commit eebc96e. Bugbot is set up for automated code reviews on this repo. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit eebc96e. Configure here.}

samsja marked this pull request as ready for review May 19, 2026 02:25

samsja marked this pull request as draft May 19, 2026 02:25

cursor Bot reviewed May 19, 2026

View reviewed changes

Comment thread scripts/token_export_visualizer.py

samsja added 6 commits May 21, 2026 23:21

feat: add token export visualizer

3fa9e38

feat: add token export html navigator

69c3d30

refactor: scope token export trainer hook

3857e20

refactor: always export enabled token traces

1addc14

fix: reuse token export visualizer tokenizer

b96bd54

refactor: simplify token export writer

95057dc

samsja force-pushed the feat/token-export-visualizer branch from 497270a to 95057dc Compare May 22, 2026 01:03

samsja mentioned this pull request May 22, 2026

Add trainer token JSONL export #2592

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add token export visualizer#2551

Add token export visualizer#2551
samsja wants to merge 6 commits into
mainfrom
feat/token-export-visualizer

samsja commented May 19, 2026 •

edited

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

samsja commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Verification

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

samsja commented May 19, 2026 •

edited

Loading