Add Codex reviewer backend to structured review#15
Merged
Conversation
Add the implementation plan for a cross-vendor reviewer backend in the bundled structured-review runner: codex exec integration, driver-keyed auto default, pinned gpt-5.5/xhigh, reviewer identity in thread headings, and backend-neutral protocol text. Spike evidence included. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Accept all five non-blocking findings: name the finalize text-capture seam and preserve module-level helper names (N1), pin print-mode -o behavior with read-only-sandbox spike evidence (N2, N3), specify backend-aware metadata semantics with reviewer_version (N4), and add a marker-resolved auto preflight with an actionable error (N5). Mark the plan accepted and implementation in progress. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Add a codex exec reviewer backend to the bundled runner with a cross-vendor auto default keyed off driver env markers, marker-resolved preflight, pinned gpt-5.5/xhigh defaults with override flags, sandboxed write-mode commits via git-dir --add-dir, JSONL parsing with last-message-file preference, backend-aware metadata with reviewer_version, and reviewer identity in pass headings. Update the skill, README, and CURRENT map to the backend-neutral default runner rule; extend tests to 56 with selection, argv, parsing, preflight, and fake-codex end-to-end coverage. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Mark the plan historical with its PR reference, add it to the agent plans index, and refresh the CURRENT map timestamp for the backend- neutral runner entry. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Reviewer agents frequently violated the append-only thread contract (deleting placeholder lines while appending), invalidating roughly half of historical write-mode passes and forcing manual driver recovery. The reviewer now always runs read-only and returns the review text; the runner verifies it (untouched worktree, review-like, no local paths or secrets), appends it verbatim under Review Threads, and creates the structured-review commit itself, so append-only holds by construction. Codex drops the write-mode workspace-write/--add-dir grant and runs read-only in both modes. verify_write_mode stays as a defense-in-depth self-check. Also delegate closeout's runner policy to structured-review per the human-approved scope addendum. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…unner-write refactor
Accept pass-3 B1: append the reviewer text byte-for-byte (only a separating blank line and a missing terminating newline are added) with an exact-output test. Treat the pass-3 environment-blocked validation as a finding: codex now runs --sandbox workspace-write with no .git grant in both modes so reviewers can run tests again, while commits stay impossible and stray worktree writes still fail the run. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…unner-write fixes
Owner
Author
|
Scope addendum (human-approved during closeout discussion), now included:
Gate evidence in the plan thread: pass 3 (codex) found one blocking issue (non-verbatim append) plus an environment-blocked-validation residual — both fixed; pass 4 (codex) confirmed: no blocking, no non-blocking, "Ready for closeout", with the reviewer rerunning the full validation suite itself under the new sandbox. Validation green at e8919dd: 61+15+15 tests, backlog check, compileall, |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add a cross-vendor reviewer backend to the bundled structured-review runner so a Claude Code driver gets a Codex reviewer by default (and vice versa):
codex execbackend: JSONL event parsing, last-message-file-first text capture, sandboxed write-mode commits via git-dir--add-dir(linked worktrees supported),--sandbox read-onlyprint mode.--reviewer-backend {auto,claude,codex}with cross-vendorautokeyed off driver env markers (CLAUDECODEvsCODEX_THREAD_ID/CODEX_SANDBOX); both markers -> hard error; neither -> claude (today's behavior); marker-resolved selections preflight the binary with an actionable error. No silent fallback.gpt-5.5/xhighwith--codex-bin/--codex-model/--codex-effortoverrides;--model/--effort/--claude-binstay Claude-only for backward compatibility.backend, active model/effort,reviewer_version, best-effort token usage) and reviewer identity in every pass heading.verify_write_mode/verify_print_modesemantics unchanged — the same post-hoc contract gates both backends.Gates (strict chain, all in the plan thread)
codex execunder--sandbox workspace-write+--add-diron a linked worktree appended threads and committed; runner verification passed. Verdict: no blocking, no non-blocking, "Ready for closeout", full traceability, reviewer-rerun validation green.Validation
structured-review/tests56 OK (16 new),tests15 OK,scout/tests15 OK,check_backlog.pypass,compileallclean,git diff --checkclean — driver-run and reviewer-rerun; CI runs the same set.Notes
closeout/SKILL.mdstill says "bundled Claude runner" in one line; protocol changes there were an explicit plan non-goal. Optional follow-up backlog item.🤖 Generated with Claude Code