Skip to content

Add token export visualizer#2551

Draft
samsja wants to merge 6 commits into
mainfrom
feat/token-export-visualizer
Draft

Add token export visualizer#2551
samsja wants to merge 6 commits into
mainfrom
feat/token-export-visualizer

Conversation

@samsja
Copy link
Copy Markdown
Member

@samsja samsja commented May 19, 2026

Summary

  • Add an opt-in trainer token exporter that writes per-sequence JSONL with token ids, reward, advantage, entropy, KL/probability/logprob/ratio data, and DPPO mask flags.
  • Keep the trainer hook narrow: setup/export/close live behind src/prime_rl/trainer/rl/token_export.py, and exporter-specific ratio/mask math is scoped there instead of inline in the trainer loop.
  • Simplify the token exporter by writing directly to a JSONL file handle instead of routing through a background queue/thread.
  • When token export is enabled, export every step from each exporting rank; no interval/cap option.
  • Thread per-sequence rewards through trainer batch structures so rollout rewards are available in the export.
  • Add a minimal HTML visualizer that decodes token ids after export, groups chat-template roles, colors high KL mismatch tokens, and shows token metadata on hover.
  • Extend the visualizer with an all-records navigator: directory inputs collect *.jsonl exports into one static HTML page, and single JSONL files can use --all-records to embed every matching record.
  • Reuse one tokenizer instance per visualizer render so directory/all-records mode does not reload it for each record.
  • Document the exporter and visualizer workflow in the config skill.

Verification

  • uv run ruff check packages/prime-rl-configs/src/prime_rl/configs/trainer.py src/prime_rl/transport/types.py src/prime_rl/trainer/batch.py src/prime_rl/trainer/rl/data.py src/prime_rl/trainer/rl/train.py src/prime_rl/trainer/rl/token_export.py scripts/token_export_visualizer.py
  • uv run ruff check src/prime_rl/trainer/rl/train.py src/prime_rl/trainer/rl/token_export.py scripts/token_export_visualizer.py
  • uv run ruff format --check src/prime_rl/trainer/rl/train.py src/prime_rl/trainer/rl/token_export.py scripts/token_export_visualizer.py
  • uv run ruff check packages/prime-rl-configs/src/prime_rl/configs/trainer.py src/prime_rl/trainer/rl/token_export.py
  • uv run ruff format --check packages/prime-rl-configs/src/prime_rl/configs/trainer.py src/prime_rl/trainer/rl/token_export.py
  • uv run python -m py_compile src/prime_rl/trainer/rl/token_export.py packages/prime-rl-configs/src/prime_rl/configs/trainer.py
  • uv run ruff check scripts/token_export_visualizer.py
  • uv run ruff format --check scripts/token_export_visualizer.py
  • uv run rl @ configs/ci/integration/reverse_text/start.toml --output-dir /tmp/prime-rl-token-export-clean-run --clean-output-dir --max-steps 1 --seq-len 512 --orchestrator.batch-size 2 --orchestrator.rollouts-per-example 2 --orchestrator.train.sampling.max-completion-tokens 16 --trainer.experimental.token-export
  • uv run scripts/token_export_visualizer.py /tmp/prime-rl-token-export-clean-run/token_exports --tokenizer PrimeIntellect/Qwen3-0.6B-Reverse-Text-SFT -o output.html
  • uv run scripts/token_export_visualizer.py /tmp/prime-rl-token-export-clean-run/token_exports --tokenizer PrimeIntellect/Qwen3-0.6B-Reverse-Text-SFT -o /tmp/prime-rl-token-export-clean-run/output.html
  • Confirmed /tmp/prime-rl-token-export-clean-run/token_exports/rank_0.jsonl has 2 records and no is_clipped field.

No pytest tests were added or run, per request.


Note

Medium Risk
Touches the RL training loop and batch serialization by adding a new rewards field and export hook; while gated behind experimental.token_export, incorrect alignment/IO failures could impact training stability or performance when enabled.

Overview
Adds opt-in per-token rollout export for the RL trainer via trainer.experimental.token_export, writing per-sequence JSONL records (token ids plus aligned metrics like reward/advantage/entropy, mismatch KL, importance ratios, and DPPO mask diagnostics) using an async writer (trainer/rl/token_export.py).

Threads rewards through the trainer data path (TrainingSample->MicroBatch->TensorMicroBatch), updates packing/padding logic to keep reward arrays aligned, and hooks exporter setup/export/close into trainer/rl/train.py.

Introduces scripts/token_export_visualizer.py, a standalone CLI that loads JSONL (file or directory), optionally decodes tokens with a tokenizer, and renders a static navigable HTML view with KL-based highlighting and hoverable token details; documents the workflow in skills/config/SKILL.md.

Reviewed by Cursor Bugbot for commit eebc96e. Bugbot is set up for automated code reviews on this repo. Configure here.

@samsja samsja marked this pull request as ready for review May 19, 2026 02:25
@samsja samsja marked this pull request as draft May 19, 2026 02:25
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit eebc96e. Configure here.

Comment thread scripts/token_export_visualizer.py
@samsja samsja force-pushed the feat/token-export-visualizer branch from 497270a to 95057dc Compare May 22, 2026 01:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant