Skip to content

[codex] Add replay SFT cookbook example#20

Open
tim0120 wants to merge 2 commits into
mainfrom
feat/replay-sft-cookbook
Open

[codex] Add replay SFT cookbook example#20
tim0120 wants to merge 2 commits into
mainfrom
feat/replay-sft-cookbook

Conversation

@tim0120

@tim0120 tim0120 commented Jun 4, 2026

Copy link
Copy Markdown

Summary

  • add a replay SFT config using prime/sft-replay
  • document the existing-messages-dataset path in the warm-start SFT guide
  • use the message-formatted HuggingFaceH4/no_robots dataset that passed replay-backed SFT validation
  • explain that replay SFT does not call a teacher during rollout collection

Scope

This PR is now standalone on main. It does not include the hosted distillation / OPD cookbook edits from the earlier stacked base.

Validation

  • Rebased onto origin/main at 8d8e5a1
  • git diff --check origin/main...HEAD

@tim0120 tim0120 force-pushed the feat/replay-sft-cookbook branch from 3fa819b to 180fa89 Compare June 5, 2026 19:31
@tim0120 tim0120 changed the base branch from codex/hosted-distillation-configs to main June 5, 2026 19:32
@tim0120 tim0120 marked this pull request as ready for review June 5, 2026 20:24
Copilot AI review requested due to automatic review settings June 5, 2026 20:24

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a replay-based SFT example to the cookbook so users can warm-start from an existing chat dataset (instead of collecting demonstrations from a teacher), and documents the required dataset shape.

Changes:

  • Documented how to run SFT by replaying an existing Hugging Face dataset with a messages column.
  • Added a new training config example for replay-backed SFT (configs/05/replay-sft.toml).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
guides/05-warm-starts-with-sft/README.md Adds a “Replay an Existing Dataset” section and includes an example replay SFT config snippet.
configs/05/replay-sft.toml Adds a runnable config file for replay-backed SFT using prime/sft-replay.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

[[env]]
id = "prime/sft-replay"

[env.args.taskset]
Comment on lines +82 to +84

[env.args.taskset]
dataset = "HuggingFaceH4/no_robots"
dataset = "HuggingFaceH4/no_robots"
```

This path does not call a teacher during rollout collection. `sft-replay` turns stored assistant messages into replayed trajectories, and the training stack tokenizes those messages before sending them to the trainer.
dataset = "HuggingFaceH4/no_robots"
```

This path does not call a teacher during rollout collection. `sft-replay` turns stored assistant messages into replayed trajectories, and the training stack tokenizes those messages before sending them to the trainer.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants