Skip to content

APR-57: add hosted OPD CLI config#699

Draft
tim0120 wants to merge 1 commit into
mainfrom
feat/hosted-opd-cli
Draft

APR-57: add hosted OPD CLI config#699
tim0120 wants to merge 1 commit into
mainfrom
feat/hosted-opd-cli

Conversation

@tim0120

@tim0120 tim0120 commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Summary

  • allow Hosted Training configs to use loss = "opd" with [teacher]
  • keep loss = "rl" as the generated template default while documenting OPD as an accepted option
  • send OPD loss and teacher config through the existing /rft/runs payload

Scope

This is CLI-only. It does not implement hosted/platform acceptance or teacher-logprob runtime wiring.

Keep this draft until engineering confirms the hosted API/runtime path is ready for OPD.

Verification

  • .venv/bin/python -m pytest packages/prime/tests/test_rl_config.py packages/prime/tests/test_rl_api.py (37 passed)
  • .venv/bin/ruff check packages/prime/src/prime_cli/commands/rl.py packages/prime/tests/test_rl_api.py packages/prime/tests/test_rl_config.py
  • git diff --check
  • GitHub checks are passing

Review Notes

  • Codex CLI review found no actionable regressions
  • Claude review could not run because the org has hit its monthly usage limit

Linear: APR-57

@tim0120

tim0120 commented Jun 1, 2026

Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Already looking forward to the next diff.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@tim0120

tim0120 commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 45322b6646

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +667 to +668
if self.loss in {"sft", "opd"} and self.teacher is None:
raise ValueError(f"teacher is required when loss is '{self.loss}'")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep rejecting OPD until hosted scoring exists

In the hosted CLI path I inspected, train() still sends this config through RLClient.create_run() to /rft/runs, and this commit does not add the teacher-logprob runtime/API support that the previous guard said OPD requires. With this validator now accepting loss = "opd" whenever a teacher is present, users can create hosted OPD runs that pass local validation but fail or behave incorrectly once scheduled; keep the local rejection until the hosted runtime path is actually wired.

Useful? React with 👍 / 👎.

@tim0120

tim0120 commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. You're on a roll.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant