You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Idea: A deterministic pre-agentic review gate to cut token burn
Problem. Our paid agentic reviewers (Dev-Lead, CodeRabbit, Copilot, the .github-private review agent) spend reasoning — and tokens — on issues a --fix flag or a type checker would have erased for free. Agent-authored code fails in characteristic, cheaply-detectable ways (hallucinated imports, dead code, format drift, plausible-but-wrong security patterns). We're paying LLM rates to catch lint debt.
Goal. Run the deterministic, $0 checks first, auto-fix what's auto-fixable, and only let token-billed agents engage once the cheap tier is green — so their reasoning is spent on real logic, not cleanup.
The frame: ordering, not just tools
Tier 0 AUTO-FIX (commits silently; agents never see it)
ruff --fix · biome check --write · prettier · gofmt · eslint --fix
│ trivial issues gone at $0
▼
Tier 1 FAIL-LOUD DETERMINISTIC GATE (blocks PR, no tokens)
type check (tsc/mypy) · semgrep · ruff/golangci-lint · knip · SonarCloud · gitleaks
│ PR must be green here…
▼
Tier 2 AGENTIC REVIEW (only now do we spend tokens)
Dev-Lead · CodeRabbit · Copilot · pr-review agent
The leverage is the Tier 1 → Tier 2 gate.
What we already have (and it's good)
pr-auto-review-reusable.yml is a model of this pattern: it dispatches the heavy .github-private review agent only when all CI checks pass, the PR is non-draft, there's no CHANGES_REQUESTED, and no unresolved threads. That reviewer is already correctly gated. ✅
The leak is the other reviewers, which engage on PR-open before CI goes green:
Reviewer
Trigger today
Gated on green CI?
.github-private review agent (via pr-auto-review)
dispatched on CI success
✅ yes
CodeRabbit (GitHub App)
PR open + every push (incremental)
❌ no
Copilot native review
PR open
❌ no
Dev-Lead
pull_request: [opened, synchronize]
❌ no (proactive path)
Proposal: "draft until deterministic-green" — one gate for every reviewer
Every reviewer above already respects (or can be configured to respect) draft state. So instead of bolting a CI-precondition onto each one separately, use a single convention:
Agents open PRs as draft. A workflow flips the PR to ready_for_review the moment Tier-1 CI is green.
Result: on a red PR, zero paid reviewers engage. When CI turns green, the promote step marks the PR ready, which fires pull_request: ready_for_review → pr-auto-review dispatches the heavy reviewer, and CodeRabbit/Copilot pick it up — all at once, all post-green.
Diff A — new pr-promote-when-green.yml reusable (promotes only agent/bot PRs; humans keep control of their own readiness):
name: Promote Draft PR When Green (Reusable)on:
workflow_call:
secrets:
GH_PAT_WORKFLOWS: { required: true }jobs:
promote:
runs-on: ubuntu-latestpermissions:
pull-requests: writechecks: readactions: readsteps:
- name: Mark agent draft PR ready when deterministic CI is greenenv:
GH_TOKEN: ${{ secrets.GH_PAT_WORKFLOWS }}run: | set -euo pipefail [ "${{ github.event.workflow_run.conclusion }}" = "success" ] \ || { echo "CI not green — leave as draft"; exit 0; } SHA="${{ github.event.workflow_run.head_sha }}" PR=$(gh api "/repos/${{ github.repository }}/commits/${SHA}/pulls" \ --jq '[.[] | select(.state=="open" and .draft==true)][0]') [ -n "$PR" ] && [ "$PR" != "null" ] || { echo "No open draft PR for $SHA"; exit 0; } NUM=$(echo "$PR" | jq -r '.number') AUTHOR=$(echo "$PR" | jq -r '.user.login') case "$AUTHOR" in *"[bot]"|donpetry-bot) gh pr ready "$NUM" --repo "${{ github.repository }}" echo "::notice::Promoted #$NUM to ready_for_review (Tier-1 CI green)";; *) echo "Human-authored #$NUM — not auto-promoting";; esac
Diff B — .coderabbit.yaml org standard (skip drafts so CodeRabbit waits for promotion):
reviews:
auto_review:
drafts: false # do not review until promoted to ready_for_review
Diff C — Copilot: set repo/org review setting to not auto-review drafts (native Copilot review honors draft state).
Diff D — AGENTS.md directive: "When opening a PR programmatically, open it as draft. Promotion to ready_for_review is automated by pr-promote-when-green once the deterministic CI tier passes." Dev-Lead's proactive pull_request path early-exits on draft.
Open question for maintainers: Dev-Lead's intent classifier (dev-lead-intent.sh, in .github-private) — does its pull_request: opened/synchronize path do anything billable before CI, or is it a cheap no-op until a real intent (CI-failure / review / mention) arrives? If the latter, Diff D is belt-and-suspenders; if the former, it's the main saving.
Type-check enforcement audit (Task 3)
Checked every non-archived repo for whether typed code is matched by a CI type-check step:
Repo
Typed code?
Type-check in CI
Verdict
markets
TS frontend
✅ npx tsc --noEmit
Compliant
broodly
TS (pnpm)
✅ pnpm run typecheck
Compliant
google-app-scripts
TS
✅ npm run typecheck
Compliant
ContentTwin
none (shell)
n/a
Not applicable
bmad-bgreat-suite
none (shell/skills; 0 .py)
n/a
Not applicable
.github-private
none (shell)
n/a
Not applicable
TalkTerm
not yet (BMAD docs; TS/Electron planned)
❌ CI runs only gitleaks
Watch
Headline: enforcement is solid where it matters — all three repos with real TS already run a type-check. This is not a widespread gap.
Two real findings:
TalkTerm has a stub CI (gitleaks only). When its Electron/TS code lands, it needs the full ci.yml (lint + type-check + test) onboarded before the first feature PR.
The systemic gap is enforcement-of-the-rule, not the rule itself.compliance-audit.sh doesn't verify that a TS repo actually has a typecheck step — compliance is by convention today. One repo onboarded without it (like TalkTerm will be) would regress silently. Add a type-check presence check to the compliance audit.
Tooling gaps worth adding to Tier 1
Semgrep CE in ci.yml — free, community rules, SARIF → Security tab (same pipeline as CodeQL), runs in seconds vs CodeQL's minutes (the fast PR-feedback layer). Killer feature: custom rules that encode our own AGENTS.md directives ("never swallow exceptions", "no undisclosed synthetic-data fallback") — turning prose rules into enforced, deterministic checks.
knip (JS/TS) — dead code, unused exports, phantom deps. Agentic code leaves these constantly; nothing in our stack catches them.
Auto-fix-and-commit step (Tier 0) — run formatters/--fix on agent-authored branches and commit, so trivial issues never reach any reviewer.
Consolidated alternatives (DeepSource — deterministic pass before the AI agent; SonarQube AI Code Assurance — AI-snippet taint analysis) exist but overlap what we already own; noted for completeness, not recommended as a switch.
Proposed next steps
Land Diff A–D (the draft-until-green gate) as a standards PR. Biggest token-burn lever.
Add Semgrep CE to the ci.yml template + 2–3 custom rules from AGENTS.md directives.
Add a type-check presence check to compliance-audit.sh; queue TalkTerm full-CI onboarding before its first code PR.
Pilot knip on broodly/markets.
Feedback welcome — especially on the Dev-Lead open question above and whether "draft until green" should apply to human PRs too (proposal keeps it bot-only).
Drafted with Claude Code from a repo-grounded analysis of our current CI/agent architecture.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Idea: A deterministic pre-agentic review gate to cut token burn
Problem. Our paid agentic reviewers (Dev-Lead, CodeRabbit, Copilot, the
.github-privatereview agent) spend reasoning — and tokens — on issues a--fixflag or a type checker would have erased for free. Agent-authored code fails in characteristic, cheaply-detectable ways (hallucinated imports, dead code, format drift, plausible-but-wrong security patterns). We're paying LLM rates to catch lint debt.Goal. Run the deterministic, $0 checks first, auto-fix what's auto-fixable, and only let token-billed agents engage once the cheap tier is green — so their reasoning is spent on real logic, not cleanup.
The frame: ordering, not just tools
The leverage is the Tier 1 → Tier 2 gate.
What we already have (and it's good)
pr-auto-review-reusable.ymlis a model of this pattern: it dispatches the heavy.github-privatereview agent only when all CI checks pass, the PR is non-draft, there's noCHANGES_REQUESTED, and no unresolved threads. That reviewer is already correctly gated. ✅The leak is the other reviewers, which engage on PR-open before CI goes green:
.github-privatereview agent (viapr-auto-review)pull_request: [opened, synchronize]Proposal: "draft until deterministic-green" — one gate for every reviewer
Every reviewer above already respects (or can be configured to respect) draft state. So instead of bolting a CI-precondition onto each one separately, use a single convention:
Result: on a red PR, zero paid reviewers engage. When CI turns green, the promote step marks the PR ready, which fires
pull_request: ready_for_review→pr-auto-reviewdispatches the heavy reviewer, and CodeRabbit/Copilot pick it up — all at once, all post-green.Diff A — new
pr-promote-when-green.ymlreusable (promotes only agent/bot PRs; humans keep control of their own readiness):Caller stub triggers on
workflow_run: { workflows: ["CI"], types: [completed] }.Diff B —
.coderabbit.yamlorg standard (skip drafts so CodeRabbit waits for promotion):Diff C — Copilot: set repo/org review setting to not auto-review drafts (native Copilot review honors draft state).
Diff D — AGENTS.md directive: "When opening a PR programmatically, open it as draft. Promotion to
ready_for_reviewis automated bypr-promote-when-greenonce the deterministic CI tier passes." Dev-Lead's proactivepull_requestpath early-exits on draft.Type-check enforcement audit (Task 3)
Checked every non-archived repo for whether typed code is matched by a CI type-check step:
marketsnpx tsc --noEmitbroodlypnpm run typecheckgoogle-app-scriptsnpm run typecheckContentTwinbmad-bgreat-suite.py).github-privateTalkTermHeadline: enforcement is solid where it matters — all three repos with real TS already run a type-check. This is not a widespread gap.
Two real findings:
TalkTermhas a stub CI (gitleaks only). When its Electron/TS code lands, it needs the fullci.yml(lint + type-check + test) onboarded before the first feature PR.compliance-audit.shdoesn't verify that a TS repo actually has atypecheckstep — compliance is by convention today. One repo onboarded without it (like TalkTerm will be) would regress silently. Add a type-check presence check to the compliance audit.Tooling gaps worth adding to Tier 1
ci.yml— free, community rules, SARIF → Security tab (same pipeline as CodeQL), runs in seconds vs CodeQL's minutes (the fast PR-feedback layer). Killer feature: custom rules that encode our own AGENTS.md directives ("never swallow exceptions", "no undisclosed synthetic-data fallback") — turning prose rules into enforced, deterministic checks.--fixon agent-authored branches and commit, so trivial issues never reach any reviewer.Consolidated alternatives (DeepSource — deterministic pass before the AI agent; SonarQube AI Code Assurance — AI-snippet taint analysis) exist but overlap what we already own; noted for completeness, not recommended as a switch.
Proposed next steps
ci.ymltemplate + 2–3 custom rules from AGENTS.md directives.compliance-audit.sh; queueTalkTermfull-CI onboarding before its first code PR.broodly/markets.Feedback welcome — especially on the Dev-Lead open question above and whether "draft until green" should apply to human PRs too (proposal keeps it bot-only).
Drafted with Claude Code from a repo-grounded analysis of our current CI/agent architecture.
Beta Was this translation helpful? Give feedback.
All reactions