Skip to content

Proposal: post-review sanity check when all specialists return NO FINDINGS #839

@Ziadstr

Description

@Ziadstr

Problem

When the review specialist army dispatches 3-5 agents on a 500+ line diff, there is no validation if all specialists return NO FINDINGS. The system trusts their output unconditionally.

In practice, agents can:

  • Rationalize away real issues ("this pattern looks intentional")
  • Hit context limits and give up silently
  • Miss issues due to prompt phrasing or model variance

A 500-line diff with zero findings from 5 specialists is suspicious. Not impossible, but worth a second look.

Proposed solution: lightweight post-review sanity check

After the specialist merge step, before Fix-First:

  1. If diff > 200 lines AND all specialists return NO FINDINGS: flag as suspicious
  2. Re-dispatch one specialist (testing or security, most likely to find something) with an explicit prompt: "A previous review of this diff found zero issues. Review again with extra scrutiny. Focus on what might have been missed."
  3. If the re-run also returns NO FINDINGS: accept it (genuine clean diff)
  4. If it finds issues: include them in findings with a note "found on second pass"

This adds one extra agent call in the rare case where a large diff gets a clean sweep, zero cost in the common case.

Why this matters

From experience building a similar review system: we added an "anti-rationalization" hook that fires PostToolUse and catches when agents produce suspiciously clean results on objectively complex code. It's caught real misses where the agent skipped auth checks, missed N+1 queries, or rationalized away a race condition.

The cost is minimal (one conditional re-dispatch), the signal is high, and it prevents false confidence in the review output.

Alternative: confidence floor

Instead of re-dispatching, require at least one finding on diffs > 200 lines. If all specialists return NO FINDINGS, output a warning in the review report: "Large diff reviewed with zero findings. Manual review recommended." This is cheaper but less actionable.

Not submitting a PR for this since it touches the review pipeline architecture and should be a design decision. Happy to implement either approach if there's interest.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions