Skip to content

[Bug]: GitHub backfill misclassifies every 403 (including permission errors) as a rate limit #1746

Description

@philluiz2323

Summary

GitHubApiError in src/github/backfill.ts classifies a failure as rate-limited with:

this.rateLimited = statusCode === 403 || statusCode === 429 || remainingHeader === "0";

Every 403 is therefore treated as a rate limit, even when it is a genuine permission/other error — Resource not accessible by integration, a missing scope, or branch protection — where x-ratelimit-remaining is still well above 0 and there is no Retry-After. The app.ts live-review path already solved exactly this with isRateLimitedResponse (a 403/429 is a rate limit only with a Retry-After, an exhausted x-ratelimit-remaining, or a secondary-limit/abuse body), but the backfill REST/GraphQL path uses raw fetch and its own error class and was never aligned.

Area

GitHub backfill

Expected behavior

A backfill 403 is a rate limit only when it carries a rate-limit signal (a Retry-After header, x-ratelimit-remaining: 0, or a secondary-limit/abuse body) — mirroring isRateLimitedResponse. A bare permission 403 surfaces as a real error (or partial), with no rate-limit wait/backoff and no rate-limit entry in operator diagnostics.

Actual behavior

rateLimited is set for any 403, so a permission 403 flows into the rate-limit branches and is recorded as a rate limit:

  • backfillRepositorySegment (src/github/backfill.ts:1246) sets the segment waiting_rate_limit and applies rate-limit backoff.
  • the metadata path (src/github/backfill.ts:1692) sets repo status rate_limited and dataQuality.rateLimited: true.

A repo whose installation is missing a scope (or hits branch protection) is reported as rate-limited and backed off, instead of surfacing the permission error to the operator.

Reproduction

During backfill, have the GitHub REST API return 403 with body Resource not accessible by integration and headers x-ratelimit-remaining: 4999 (no Retry-After). The affected segment is recorded as rate_limited / waiting_rate_limit rather than error.

Validation

Replace the statusCode === 403 blanket check with a shared predicate that mirrors isRateLimitedResponse (status 403/429 + Retry-After / x-ratelimit-remaining: 0 / secondary-limit body), threading the retry-after header and response body into GitHubApiError. Covered by unit tests for the predicate (permission 403, exhausted-remaining, Retry-After, secondary-limit body, bare 429, non-403/429) plus a backfill regression test asserting a permission 403 yields an error segment, not rate_limited. Full local npm run test:ci green.

Metadata

Metadata

Assignees

No one assigned

    Labels

    slopFarming suspected/slop issues + PRs.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions