Skip to content

fix(compliance): audit files false-positive findings on transient API read errors #437

@don-petry

Description

@don-petry

Summary

The weekly compliance audit (scripts/compliance-audit.sh) files findings based on single GitHub API reads at scan time. When a read returns a transient error (HTTP 404/403 on a resource that actually exists and is compliant), the audit records a false-positive finding and opens a dev-lead issue for it. The dev-lead agent then spends a full run investigating, only to confirm there was nothing to fix.

This surfaced concretely while re-triggering the stale-issue backlog (see #431): 2 of 5 re-triggered findings were false positives caused by transient read failures.

Evidence

Both findings were re-triggered on 2026-06-10 and picked up by dev-lead, which verified the underlying state was already compliant:

Issue Finding What the agent found
broodly#99 unpinned-actions-agent-shield.yml agent-shield.yml was already SHA-pinned on main (376a4fcb… # v2, merged 2026-06-08). The audit detected an earlier @v1 state. Agent opened no PR (correct no-op).
.github-private#61 codeowners-org-leads-not-first Agent's own note: "the compliance audit received HTTP 404 when trying to read .github/CODEOWNERS via the GitHub API at scan time. The file already existed and was compliant." Resulted in defensive PR #552.

Impact

  • Wasted dev-lead runs (agent time + tokens) investigating non-issues.
  • Noise in the fleet's open-issue list and the re-trigger sweep backlog.
  • Erodes trust in compliance findings — false positives make real findings easier to ignore.

Root Cause

Single-attempt API reads in the audit with no retry/confirmation. A transient 404/403/5xx (eventual consistency, rate-limit, brief unavailability) is treated as authoritative "resource missing / non-compliant."

Recommended Actions

  1. Retry transient reads in scripts/compliance-audit.sh — wrap resource reads (CODEOWNERS, workflow files, settings) in a small bounded retry with backoff; only treat a resource as missing after N consistent failures.
  2. Distinguish "unreadable" from "non-compliant" — a read that errors should not map to a compliance failure. Emit a separate audit-error/inconclusive outcome (logged, not filed as a dev-lead issue) so transient errors never become findings.
  3. Re-confirm before filing — for any negative finding, do one confirming re-read before creating/updating the issue.
  4. (Optional) Auto-close on non-reproduction — the audit already has a "resolved/removed" pass; ensure a finding that no longer reproduces closes its issue promptly so stale false positives self-clean.

Context

Discovered during the #431 re-trigger work (PR #432). Part of the Compliance program initiative (GH Project #1 Initiatives).

Metadata

Metadata

Assignees

No one assigned

    Labels

    automationAutomation improvements and gapsbugBug reportsciCI/CD pipeline issues

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions