Skip to content

Add Codex reviewer backend to structured review#15

Merged
resodo merged 10 commits into
mainfrom
feature/codex-reviewer-backend
Jun 10, 2026
Merged

Add Codex reviewer backend to structured review#15
resodo merged 10 commits into
mainfrom
feature/codex-reviewer-backend

Conversation

@resodo

@resodo resodo commented Jun 10, 2026

Copy link
Copy Markdown
Owner

Summary

Add a cross-vendor reviewer backend to the bundled structured-review runner so a Claude Code driver gets a Codex reviewer by default (and vice versa):

  • codex exec backend: JSONL event parsing, last-message-file-first text capture, sandboxed write-mode commits via git-dir --add-dir (linked worktrees supported), --sandbox read-only print mode.
  • --reviewer-backend {auto,claude,codex} with cross-vendor auto keyed off driver env markers (CLAUDECODE vs CODEX_THREAD_ID/CODEX_SANDBOX); both markers -> hard error; neither -> claude (today's behavior); marker-resolved selections preflight the binary with an actionable error. No silent fallback.
  • Pinned Codex defaults gpt-5.5/xhigh with --codex-bin/--codex-model/--codex-effort overrides; --model/--effort/--claude-bin stay Claude-only for backward compatibility.
  • Backend-aware metadata (backend, active model/effort, reviewer_version, best-effort token usage) and reviewer identity in every pass heading.
  • verify_write_mode/verify_print_mode semantics unchanged — the same post-hoc contract gates both backends.
  • Protocol text backend-neutral: SKILL Default Runner Rule, README usage, CURRENT map.

Gates (strict chain, all in the plan thread)

  1. Plan review (claude reviewer): no blocking; 5 non-blocking refinements accepted in driver response 1.
  2. Implementation review (codex reviewer, dogfooded): the new backend reviewed its own implementation live — real codex exec under --sandbox workspace-write + --add-dir on a linked worktree appended threads and committed; runner verification passed. Verdict: no blocking, no non-blocking, "Ready for closeout", full traceability, reviewer-rerun validation green.

Validation

structured-review/tests 56 OK (16 new), tests 15 OK, scout/tests 15 OK, check_backlog.py pass, compileall clean, git diff --check clean — driver-run and reviewer-rerun; CI runs the same set.

Notes

  • closeout/SKILL.md still says "bundled Claude runner" in one line; protocol changes there were an explicit plan non-goal. Optional follow-up backlog item.
  • Live spike + dogfood evidence covers real Codex sandbox semantics that fake-bin tests cannot reach.

🤖 Generated with Claude Code

resodo and others added 10 commits June 10, 2026 06:38
Add the implementation plan for a cross-vendor reviewer backend in the
bundled structured-review runner: codex exec integration, driver-keyed
auto default, pinned gpt-5.5/xhigh, reviewer identity in thread headings,
and backend-neutral protocol text. Spike evidence included.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Accept all five non-blocking findings: name the finalize text-capture
seam and preserve module-level helper names (N1), pin print-mode -o
behavior with read-only-sandbox spike evidence (N2, N3), specify
backend-aware metadata semantics with reviewer_version (N4), and add a
marker-resolved auto preflight with an actionable error (N5). Mark the
plan accepted and implementation in progress.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Add a codex exec reviewer backend to the bundled runner with a
cross-vendor auto default keyed off driver env markers, marker-resolved
preflight, pinned gpt-5.5/xhigh defaults with override flags, sandboxed
write-mode commits via git-dir --add-dir, JSONL parsing with
last-message-file preference, backend-aware metadata with
reviewer_version, and reviewer identity in pass headings. Update the
skill, README, and CURRENT map to the backend-neutral default runner
rule; extend tests to 56 with selection, argv, parsing, preflight, and
fake-codex end-to-end coverage.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Mark the plan historical with its PR reference, add it to the agent
plans index, and refresh the CURRENT map timestamp for the backend-
neutral runner entry.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Reviewer agents frequently violated the append-only thread contract
(deleting placeholder lines while appending), invalidating roughly half
of historical write-mode passes and forcing manual driver recovery. The
reviewer now always runs read-only and returns the review text; the
runner verifies it (untouched worktree, review-like, no local paths or
secrets), appends it verbatim under Review Threads, and creates the
structured-review commit itself, so append-only holds by construction.
Codex drops the write-mode workspace-write/--add-dir grant and runs
read-only in both modes. verify_write_mode stays as a defense-in-depth
self-check. Also delegate closeout's runner policy to structured-review
per the human-approved scope addendum.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Accept pass-3 B1: append the reviewer text byte-for-byte (only a
separating blank line and a missing terminating newline are added) with
an exact-output test. Treat the pass-3 environment-blocked validation as
a finding: codex now runs --sandbox workspace-write with no .git grant
in both modes so reviewers can run tests again, while commits stay
impossible and stray worktree writes still fail the run.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@resodo

resodo commented Jun 10, 2026

Copy link
Copy Markdown
Owner Author

Scope addendum (human-approved during closeout discussion), now included:

  1. closeout/SKILL.md delegation — the duplicated (and stale) runner policy is removed; closeout keeps its trigger list and defers runner/backend/mode policy to structured-review.
  2. Runner-owned write mode — the reviewer always runs read-only-for-the-repo and returns the review text; the runner verifies it (untouched worktree, review-like, no local paths/secrets), appends it byte-for-byte under ## Review Threads, and creates the commit itself. The append-only contract that reviewer agents historically violated in ~half of write-mode runs now holds by construction. Codex runs --sandbox workspace-write with no .git grant in both modes, so reviewers can run tests while commits remain impossible.

Gate evidence in the plan thread: pass 3 (codex) found one blocking issue (non-verbatim append) plus an environment-blocked-validation residual — both fixed; pass 4 (codex) confirmed: no blocking, no non-blocking, "Ready for closeout", with the reviewer rerunning the full validation suite itself under the new sandbox. Validation green at e8919dd: 61+15+15 tests, backlog check, compileall, git diff --check.

@resodo resodo merged commit 4bb87b0 into main Jun 10, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant