RFC: Deterministic pre-agentic review gate to cut token burn (draft-until-green) #447

don-petry · 2026-06-11T19:47:00Z

don-petry
Jun 11, 2026
Maintainer

Idea: A deterministic pre-agentic review gate to cut token burn

Problem. Our paid agentic reviewers (Dev-Lead, CodeRabbit, Copilot, the .github-private review agent) spend reasoning — and tokens — on issues a --fix flag or a type checker would have erased for free. Agent-authored code fails in characteristic, cheaply-detectable ways (hallucinated imports, dead code, format drift, plausible-but-wrong security patterns). We're paying LLM rates to catch lint debt.

Goal. Run the deterministic, $0 checks first, auto-fix what's auto-fixable, and only let token-billed agents engage once the cheap tier is green — so their reasoning is spent on real logic, not cleanup.

The frame: ordering, not just tools

Tier 0  AUTO-FIX (commits silently; agents never see it)
        ruff --fix · biome check --write · prettier · gofmt · eslint --fix
                 │  trivial issues gone at $0
                 ▼
Tier 1  FAIL-LOUD DETERMINISTIC GATE (blocks PR, no tokens)
        type check (tsc/mypy) · semgrep · ruff/golangci-lint · knip · SonarCloud · gitleaks
                 │  PR must be green here…
                 ▼
Tier 2  AGENTIC REVIEW (only now do we spend tokens)
        Dev-Lead · CodeRabbit · Copilot · pr-review agent

The leverage is the Tier 1 → Tier 2 gate.

What we already have (and it's good)

pr-auto-review-reusable.yml is a model of this pattern: it dispatches the heavy .github-private review agent only when all CI checks pass, the PR is non-draft, there's no CHANGES_REQUESTED, and no unresolved threads. That reviewer is already correctly gated. ✅

The leak is the other reviewers, which engage on PR-open before CI goes green:

Reviewer	Trigger today	Gated on green CI?
`.github-private` review agent (via `pr-auto-review`)	dispatched on CI success	✅ yes
CodeRabbit (GitHub App)	PR open + every push (incremental)	❌ no
Copilot native review	PR open	❌ no
Dev-Lead	`pull_request: [opened, synchronize]`	❌ no (proactive path)

Proposal: "draft until deterministic-green" — one gate for every reviewer

Every reviewer above already respects (or can be configured to respect) draft state. So instead of bolting a CI-precondition onto each one separately, use a single convention:

Agents open PRs as draft. A workflow flips the PR to ready_for_review the moment Tier-1 CI is green.

Result: on a red PR, zero paid reviewers engage. When CI turns green, the promote step marks the PR ready, which fires pull_request: ready_for_review → pr-auto-review dispatches the heavy reviewer, and CodeRabbit/Copilot pick it up — all at once, all post-green.

Diff A — new pr-promote-when-green.yml reusable (promotes only agent/bot PRs; humans keep control of their own readiness):

name: Promote Draft PR When Green (Reusable)
on:
  workflow_call:
    secrets:
      GH_PAT_WORKFLOWS: { required: true }
jobs:
  promote:
    runs-on: ubuntu-latest
    permissions:
      pull-requests: write
      checks: read
      actions: read
    steps:
      - name: Mark agent draft PR ready when deterministic CI is green
        env:
          GH_TOKEN: ${{ secrets.GH_PAT_WORKFLOWS }}
        run: |
          set -euo pipefail
          [ "${{ github.event.workflow_run.conclusion }}" = "success" ] \
            || { echo "CI not green — leave as draft"; exit 0; }
          SHA="${{ github.event.workflow_run.head_sha }}"
          PR=$(gh api "/repos/${{ github.repository }}/commits/${SHA}/pulls" \
            --jq '[.[] | select(.state=="open" and .draft==true)][0]')
          [ -n "$PR" ] && [ "$PR" != "null" ] || { echo "No open draft PR for $SHA"; exit 0; }
          NUM=$(echo "$PR" | jq -r '.number')
          AUTHOR=$(echo "$PR" | jq -r '.user.login')
          case "$AUTHOR" in
            *"[bot]"|donpetry-bot)
              gh pr ready "$NUM" --repo "${{ github.repository }}"
              echo "::notice::Promoted #$NUM to ready_for_review (Tier-1 CI green)";;
            *) echo "Human-authored #$NUM — not auto-promoting";;
          esac

Caller stub triggers on workflow_run: { workflows: ["CI"], types: [completed] }.

Diff B — .coderabbit.yaml org standard (skip drafts so CodeRabbit waits for promotion):

reviews:
  auto_review:
    drafts: false        # do not review until promoted to ready_for_review

Diff C — Copilot: set repo/org review setting to not auto-review drafts (native Copilot review honors draft state).

Diff D — AGENTS.md directive: "When opening a PR programmatically, open it as draft. Promotion to ready_for_review is automated by pr-promote-when-green once the deterministic CI tier passes." Dev-Lead's proactive pull_request path early-exits on draft.

Open question for maintainers: Dev-Lead's intent classifier (dev-lead-intent.sh, in .github-private) — does its pull_request: opened/synchronize path do anything billable before CI, or is it a cheap no-op until a real intent (CI-failure / review / mention) arrives? If the latter, Diff D is belt-and-suspenders; if the former, it's the main saving.

Type-check enforcement audit (Task 3)

Checked every non-archived repo for whether typed code is matched by a CI type-check step:

Repo	Typed code?	Type-check in CI	Verdict
`markets`	TS frontend	✅ `npx tsc --noEmit`	Compliant
`broodly`	TS (pnpm)	✅ `pnpm run typecheck`	Compliant
`google-app-scripts`	TS	✅ `npm run typecheck`	Compliant
`ContentTwin`	none (shell)	n/a	Not applicable
`bmad-bgreat-suite`	none (shell/skills; 0 `.py`)	n/a	Not applicable
`.github-private`	none (shell)	n/a	Not applicable
`TalkTerm`	not yet (BMAD docs; TS/Electron planned)	❌ CI runs only gitleaks	Watch

Headline: enforcement is solid where it matters — all three repos with real TS already run a type-check. This is not a widespread gap.

Two real findings:

TalkTerm has a stub CI (gitleaks only). When its Electron/TS code lands, it needs the full ci.yml (lint + type-check + test) onboarded before the first feature PR.
The systemic gap is enforcement-of-the-rule, not the rule itself. compliance-audit.sh doesn't verify that a TS repo actually has a typecheck step — compliance is by convention today. One repo onboarded without it (like TalkTerm will be) would regress silently. Add a type-check presence check to the compliance audit.

Tooling gaps worth adding to Tier 1

Semgrep CE in ci.yml — free, community rules, SARIF → Security tab (same pipeline as CodeQL), runs in seconds vs CodeQL's minutes (the fast PR-feedback layer). Killer feature: custom rules that encode our own AGENTS.md directives ("never swallow exceptions", "no undisclosed synthetic-data fallback") — turning prose rules into enforced, deterministic checks.
knip (JS/TS) — dead code, unused exports, phantom deps. Agentic code leaves these constantly; nothing in our stack catches them.
Auto-fix-and-commit step (Tier 0) — run formatters/--fix on agent-authored branches and commit, so trivial issues never reach any reviewer.

Consolidated alternatives (DeepSource — deterministic pass before the AI agent; SonarQube AI Code Assurance — AI-snippet taint analysis) exist but overlap what we already own; noted for completeness, not recommended as a switch.

Proposed next steps

Land Diff A–D (the draft-until-green gate) as a standards PR. Biggest token-burn lever.
Add Semgrep CE to the ci.yml template + 2–3 custom rules from AGENTS.md directives.
Add a type-check presence check to compliance-audit.sh; queue TalkTerm full-CI onboarding before its first code PR.
Pilot knip on broodly/markets.

Feedback welcome — especially on the Dev-Lead open question above and whether "draft until green" should apply to human PRs too (proposal keeps it bot-only).

Drafted with Claude Code from a repo-grounded analysis of our current CI/agent architecture.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Deterministic pre-agentic review gate to cut token burn (draft-until-green) #447

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

RFC: Deterministic pre-agentic review gate to cut token burn (draft-until-green) #447

Uh oh!

don-petry Jun 11, 2026 Maintainer

Idea: A deterministic pre-agentic review gate to cut token burn

The frame: ordering, not just tools

What we already have (and it's good)

Proposal: "draft until deterministic-green" — one gate for every reviewer

Type-check enforcement audit (Task 3)

Tooling gaps worth adding to Tier 1

Proposed next steps

Replies: 0 comments

don-petry
Jun 11, 2026
Maintainer