APR-57: add hosted OPD CLI config#699
Conversation
|
@codex review |
|
Codex Review: Didn't find any major issues. Already looking forward to the next diff. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 45322b6646
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if self.loss in {"sft", "opd"} and self.teacher is None: | ||
| raise ValueError(f"teacher is required when loss is '{self.loss}'") |
There was a problem hiding this comment.
Keep rejecting OPD until hosted scoring exists
In the hosted CLI path I inspected, train() still sends this config through RLClient.create_run() to /rft/runs, and this commit does not add the teacher-logprob runtime/API support that the previous guard said OPD requires. With this validator now accepting loss = "opd" whenever a teacher is present, users can create hosted OPD runs that pass local validation but fail or behave incorrectly once scheduled; keep the local rejection until the hosted runtime path is actually wired.
Useful? React with 👍 / 👎.
|
@codex review |
|
Codex Review: Didn't find any major issues. You're on a roll. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Summary
loss = "opd"with[teacher]loss = "rl"as the generated template default while documenting OPD as an accepted option/rft/runspayloadScope
This is CLI-only. It does not implement hosted/platform acceptance or teacher-logprob runtime wiring.
Keep this draft until engineering confirms the hosted API/runtime path is ready for OPD.
Verification
.venv/bin/python -m pytest packages/prime/tests/test_rl_config.py packages/prime/tests/test_rl_api.py(37 passed).venv/bin/ruff check packages/prime/src/prime_cli/commands/rl.py packages/prime/tests/test_rl_api.py packages/prime/tests/test_rl_config.pygit diff --checkReview Notes
Linear: APR-57