Skip to content

feat(backfill): reuse GitHub totals snapshots#1856

Merged
JSONbored merged 1 commit into
JSONbored:mainfrom
oktofeesh1:codex/backfill-totals-snapshot-reuse
Jun 30, 2026
Merged

feat(backfill): reuse GitHub totals snapshots#1856
JSONbored merged 1 commit into
JSONbored:mainfrom
oktofeesh1:codex/backfill-totals-snapshot-reuse

Conversation

@oktofeesh1

Copy link
Copy Markdown
Contributor

Summary

What changed

  • Added a bounded totals snapshot selector that skips future-dated, malformed, and source-mismatched rows.
  • Shared the selector between enqueue, segment workers, and sync-state rollup fallback counts.
  • Added in-process single-flight for live totals refreshes by repository/source.
  • Covered fresh reuse, stale refresh/fallback, invalid snapshot rejection, no-token fallback, older-valid history fallback, and concurrent coalescing.

Why

  • Reduces repeated GitHub GraphQL totals queries during scheduled and manual self-hosted open-data backfill bursts while keeping expected-count decisions conservative.

Validation

  • npx vitest run test/unit/backfill.test.ts
  • npm run test:ci
  • npm audit --audit-level=moderate

@oktofeesh1 oktofeesh1 requested a review from JSONbored as a code owner June 30, 2026 09:50
@dosubot dosubot Bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Jun 30, 2026
@gittensory-orb

gittensory-orb Bot commented Jun 30, 2026

Copy link
Copy Markdown

Warning

🟨🟨🟨🟨🟨🟨🟨🟨🟨🟨🟨🟨

⏸️ Gittensory review result - manual review recommended

Review updated: 2026-06-30 19:05:41 UTC

2 files · 1 AI reviewer · no blockers · readiness 55/100 · CI green · unknown

⏸️ Suggested Action - Manual Review

  • Touches a guarded path — held for manual review

Review summary
The change centralizes totals snapshot selection for backfill enqueue, segment work, and sync-state rollup, and it adds a per-process single-flight around live totals refreshes. The visible logic correctly rejects source-mismatched, future-dated, and malformed rows before falling back, and the tests cover the main reuse/refresh/fallback paths. The main review risk is ordering: the selector’s correctness depends on `listRepoGithubTotalsSnapshotHistory` returning rows oldest-to-newest, but that contract is not visible in this diff.

Nits — 7 non-blocking
  • nit: `src/github/backfill.ts:498` relies on the snapshot-history ordering when scanning from `snapshots.length - 1`; add an inline comment or make `listRepoGithubTotalsSnapshotHistory` return/order contract explicit so a future repository helper change does not silently make this choose the oldest valid snapshot.
  • nit: `test/unit/backfill.test.ts:1439` does not distinguish ascending from descending history because the latest row is invalid; add a case with two valid same-source snapshots and assert the newer one wins.
  • In `test/unit/backfill.test.ts`, add a valid-newer-versus-valid-older snapshot test for `backfillRepositorySegment` or enqueue so the selector’s “latest valid” behavior is locked down.
  • In `src/github/backfill.ts`, consider changing the selector loop to operate on a helper with an explicit name such as `latestUsableRepoGithubTotalsSnapshot` if the repository API is intentionally ordered oldest-to-newest.
  • PR author also opened the linked issue — Link an issue that was opened by a different contributor, or provide a rationale for why this self-authored issue represents genuine discovery work.
  • Readiness score is below the configured threshold — Use the readiness panel as advisory maintainer context; the score does not block this PR.
  • Touches a guarded path — held for manual review — A maintainer must review and merge this change.
Signal Result Evidence
Code review ✅ No blockers 1 reviewer
Linked issue ✅ Linked #1855
Related work ⚠️ 3 scoped overlaps Top overlaps are listed below; lower-confidence bulk is hidden.
Change scope ❌ 8/20 High review scope from cached public metadata (size label size:M; 1 linked issue).
Validation posture ❌ 5/25 Preflight is holding this PR; address the blocker before review.
Contributor workload ✅ 10/10 Author activity: 71 registered-repo PR(s), 60 merged, 7 issue(s).
Contributor context ✅ Confirmed Gittensor contributor oktofeesh1; Gittensor profile; 71 PR(s), 7 issue(s).
Gate result ⚠️ Not blocking Advisory; not blocking this PR.
Review context
Contributor next steps
  • Review top overlaps.
  • Add a concise scope and risk note.
  • Fix the blocker.
  • Triage stale or unlinked PRs.
  • Refresh registry data or choose a registered active repo.
  • Check active issues and PRs before submitting.
Signal definitions
  • Related work = same linked issue, overlapping active PRs, or title/path similarity.
  • Change scope = cached public metadata such as size labels, draft state, and review-burden hints.
  • Validation posture = whether the PR provides enough public validation/test evidence for maintainer review.
  • Contributor workload = public contributor activity and cleanup pressure, not a repo-wide quality failure.
  • Contributor context = public GitHub/Gittensor identity context; non-Gittensor status is not a blocker.

🟩 Safe / merged · 🟦 Advisory · 🟨 Held for review · 🟥 Blocked / closed


💰 Earn for open-source contributions like this. Gittensor lets GitHub contributors earn for the work they already do — register to start earning →.

Checked by Gittensory, a quiet PR intelligence layer for OSS maintainers.

  • Re-run Gittensory review

@gittensory-orb gittensory-orb Bot added gittensor Gittensor contributor context gittensor:feature Gittensor-scored feature linked to a feature issue — scores a 1.25x multiplier. labels Jun 30, 2026
@dosubot dosubot Bot added the lgtm Approved by a maintainer. label Jun 30, 2026
@JSONbored JSONbored merged commit a6feff5 into JSONbored:main Jun 30, 2026
12 checks passed
@github-project-automation github-project-automation Bot moved this from Todo to Done in gittensory - v1 roadmap Jun 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gittensor:feature Gittensor-scored feature linked to a feature issue — scores a 1.25x multiplier. gittensor Gittensor contributor context lgtm Approved by a maintainer. size:M This PR changes 30-99 lines, ignoring generated files.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

Reduce duplicate GitHub totals queries during open-data backfill

2 participants