[codex] Add replay SFT cookbook example#20
Open
tim0120 wants to merge 2 commits into
Open
Conversation
3fa819b to
180fa89
Compare
There was a problem hiding this comment.
Pull request overview
Adds a replay-based SFT example to the cookbook so users can warm-start from an existing chat dataset (instead of collecting demonstrations from a teacher), and documents the required dataset shape.
Changes:
- Documented how to run SFT by replaying an existing Hugging Face dataset with a
messagescolumn. - Added a new training config example for replay-backed SFT (
configs/05/replay-sft.toml).
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| guides/05-warm-starts-with-sft/README.md | Adds a “Replay an Existing Dataset” section and includes an example replay SFT config snippet. |
| configs/05/replay-sft.toml | Adds a runnable config file for replay-backed SFT using prime/sft-replay. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| [[env]] | ||
| id = "prime/sft-replay" | ||
|
|
||
| [env.args.taskset] |
Comment on lines
+82
to
+84
|
|
||
| [env.args.taskset] | ||
| dataset = "HuggingFaceH4/no_robots" |
| dataset = "HuggingFaceH4/no_robots" | ||
| ``` | ||
|
|
||
| This path does not call a teacher during rollout collection. `sft-replay` turns stored assistant messages into replayed trajectories, and the training stack tokenizes those messages before sending them to the trainer. |
| dataset = "HuggingFaceH4/no_robots" | ||
| ``` | ||
|
|
||
| This path does not call a teacher during rollout collection. `sft-replay` turns stored assistant messages into replayed trajectories, and the training stack tokenizes those messages before sending them to the trainer. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
prime/sft-replayHuggingFaceH4/no_robotsdataset that passed replay-backed SFT validationScope
This PR is now standalone on
main. It does not include the hosted distillation / OPD cookbook edits from the earlier stacked base.Validation
origin/mainat8d8e5a1git diff --check origin/main...HEAD