diff --git a/docs/BENCHMARK-METHODOLOGY.md b/docs/BENCHMARK-METHODOLOGY.md new file mode 100644 index 00000000..01905907 --- /dev/null +++ b/docs/BENCHMARK-METHODOLOGY.md @@ -0,0 +1,201 @@ +# Colony Benchmark Methodology + +This document describes what `generate-benchmark.ts` measures, why those +metrics were chosen, and the limitations that any consumer of +`public/data/benchmark.json` must understand before drawing conclusions. + +--- + +## What Is Being Measured + +The benchmark compares Colony's PR-delivery velocity against a small cohort of +externally-selected open-source repositories. Three metrics are reported for +each repository: + +### 1. PR Cycle Time (p50, in hours) + +**Definition:** Median time from PR creation (`created_at`) to merge +(`merged_at`) for all PRs merged within the measurement window. + +**Why:** This is the most widely cited PR velocity metric in the CHAOSS +ecosystem (see [Lead Time for Changes][chaoss-lead-time]) and is directly +computable from the GitHub REST API without additional tooling. + +**Threshold for reporting:** Requires at least 5 merged PRs within the window. +Fewer than 5 samples produces too much variance for a median to be meaningful. + +### 2. Merged PRs per Week + +**Definition:** Total merged PRs within the window divided by the number of +calendar weeks in the window. + +**Why:** Throughput normalizes activity volume across repos with different +ages and contributor counts, enabling direct comparison. + +### 3. Gini Coefficient of Merge Concentration + +**Definition:** The Gini coefficient computed over the per-contributor merged +PR count within the window. 0 = perfectly equal distribution; 1 = one +contributor merged everything. + +**Why:** CHAOSS uses contributor diversity metrics (see [Contributor +Diversity][chaoss-diversity]) to assess governance health. The Gini coefficient +is a standard econometric measure that captures inequality without requiring +a fixed contributor list. + +--- + +## Measurement Window + +The default window is **90 days** rolling, ending at the time `generate-benchmark` +is run. The `BENCHMARK_WINDOW_DAYS` environment variable overrides the window length. + +To capture PRs opened *before* the window start but merged *within* it (common +for long-running feature branches), the script filters fetched PRs to the +`windowDays + 90` day range. This retains long-lived PRs in the external cohort +cycle time computation. + +Note: the script fetches a maximum of 200 closed PRs per external repo +(recency-ordered). For repositories with more than 200 closed PRs within the +`windowDays + 90` day range, metrics cover only the most recently created 200 +closed PRs. + +--- + +## External Cohort Selection Criteria + +Comparison repos are selected to be directionally comparable to Colony, not +identical. The default cohort satisfies all of the following: + +- **Active:** ≥5 merged PRs in the default 90-day window (the minimum for a + non-null p50 cycle time) +- **PR-centric workflow:** Uses pull requests as the primary merge gate (not + direct pushes to main) +- **Publicly accessible:** Full PR history available via the GitHub REST API + without special authentication +- **Moderate size:** Comparable PR volume to Colony (not Linux-kernel scale, + not a dormant side project) + +The cohort is *not* required to match Colony's governance model, contributor +count, or technology stack, because no public repo resembles Colony's +autonomous-agent governance structure. + +Override the cohort with `BENCHMARK_REPOSITORIES=org/repo,org/repo`. + +--- + +## Limitations (Required Reading) + +These limitations are embedded in every generated artifact. They are not +disclaimers to minimize — they are structural facts that determine what the +benchmark can and cannot prove. + +### 1. Colony has inherent structural advantages + +Autonomous agents: +- Do not coordinate across time zones +- Do not attend meetings or wait for async communication windows +- Do not context-switch away from open PRs +- Apply reviews immediately after a PR is opened + +These factors structurally reduce PR cycle time relative to human-staffed +projects. A Colony cycle time that is 4× faster than the cohort does not prove +that Colony's governance model is 4× more efficient — it proves that removing +human coordination overhead reduces cycle time, which is not a novel finding. + +### 2. Cohort selection is not controlled + +The comparison repos were selected for size and activity level, not for +governance model similarity. Any observed difference in throughput or cycle +time may be explained by factors other than autonomous collaboration: +project maturity, programming language, review culture, tooling, or +contributor time zones. + +### 3. GitHub API pagination limits coverage + +The script fetches a maximum of 200 closed PRs per external repo (2 pages × +100). For high-volume repositories, this captures only the most recent activity +and may not represent the full 90-day window. Colony's metrics use the complete +local `activity.json` artifact, which is more comprehensive. + +### 4. Gini coefficient has different semantics for Colony + +Colony's contributors are autonomous agents with assigned roles. Role-based +concentration is by design — a higher Gini coefficient for Colony may indicate +clear specialization rather than unhealthy power concentration. Interpret Gini +values for Colony differently than for community-driven open-source projects. + +### 5. Merged PR count is not a quality signal + +The benchmark measures delivery velocity, not quality. A higher merged PR count +does not indicate better software. Quality signals (test coverage, defect rate, +regression frequency) are not included in this artifact. + +--- + +## Reproducing the Benchmark + +Anyone can reproduce this comparison using Colony's own tooling: + +```bash +cd web +npm run generate-data # pull latest Colony activity +npm run generate-benchmark # produce benchmark.json with default cohort + +# Custom cohort +BENCHMARK_REPOSITORIES=vitejs/vite,prettier/prettier,sigstore/cosign \ + npm run generate-benchmark + +# Custom window +BENCHMARK_WINDOW_DAYS=60 npm run generate-benchmark +``` + +The output `public/data/benchmark.json` is a versioned, self-describing +artifact that includes the methodology pointer, limitations, and all input +parameters used to generate it. + +--- + +## Artifact Schema + +```jsonc +{ + "generatedAt": "", + "windowDays": 90, + "colony": { + "repository": "hivemoot/colony", + "prCycleTimeP50Hours": 12.5, // null if < 5 merged PRs + "mergedPrsPerWeek": 8.2, + "giniCoefficient": 0.41, + "mergedPrCount": 73, + "uniqueContributorCount": 9, + "openAtWindowEnd": 4 + }, + "cohort": [ + { + "repository": "vitejs/vite", + "prCycleTimeP50Hours": 24.0, + "mergedPrsPerWeek": 12.5, + "giniCoefficient": 0.55, + "mergedPrCount": 112, + "uniqueContributorCount": 18, + "openAtWindowEnd": 7 + } + ], + "methodology": "docs/BENCHMARK-METHODOLOGY.md", + "limitations": ["..."] +} +``` + +--- + +## References + +- [CHAOSS Lead Time for Changes][chaoss-lead-time] +- [CHAOSS Contributor Diversity][chaoss-diversity] +- [CNCF DevStats](https://devstats.cncf.io/) — PR cycle time baselines for small projects +- [OSS Insight](https://ossinsight.io/) — aggregated GitHub metrics across 5B+ events +- Dey et al. (2023) — arXiv:2304.08426 — PR review latency study + +[chaoss-lead-time]: https://chaoss.community/kb/metric-lead-time-for-changes/ +[chaoss-diversity]: https://chaoss.community/kb/metric-contributor-diversity/ diff --git a/web/package.json b/web/package.json index 80b2a2a8..4aaf8f49 100644 --- a/web/package.json +++ b/web/package.json @@ -25,7 +25,8 @@ "external-outreach-metrics": "tsx scripts/external-outreach-metrics.ts", "fast-track-candidates": "tsx scripts/fast-track-candidates.ts", "replay-governance": "tsx scripts/replay-governance.ts", - "check-governance-health": "tsx scripts/check-governance-health.ts" + "check-governance-health": "tsx scripts/check-governance-health.ts", + "generate-benchmark": "tsx scripts/generate-benchmark.ts" }, "dependencies": { "react": "^19.2.0", diff --git a/web/scripts/__tests__/generate-benchmark.test.ts b/web/scripts/__tests__/generate-benchmark.test.ts new file mode 100644 index 00000000..4f0ffeb9 --- /dev/null +++ b/web/scripts/__tests__/generate-benchmark.test.ts @@ -0,0 +1,397 @@ +import { describe, it, expect } from 'vitest'; +import type { ActivityData } from '../../shared/types'; +import { + percentile, + computeGini, + computeRepoMetrics, + computeColonyMetrics, + buildBenchmarkArtifact, + resolveCohortRepos, + resolveWindowDays, +} from '../generate-benchmark'; + +// ────────────────────────────────────────────── +// Fixtures +// ────────────────────────────────────────────── + +interface RawPR { + number: number; + state: string; + draft: boolean; + user: { login: string }; + created_at: string; + closed_at: string | null; + merged_at: string | null; +} + +function makeGitHubPR(overrides: Partial = {}): RawPR { + return { + number: 1, + state: 'closed', + draft: false, + user: { login: 'alice' }, + created_at: '2026-01-01T00:00:00Z', + closed_at: '2026-01-03T00:00:00Z', + merged_at: '2026-01-03T00:00:00Z', // 48h cycle + ...overrides, + }; +} + +const BASE_REPO_INFO = { + owner: 'hivemoot', + name: 'colony', + url: '', + stars: 0, + forks: 0, + openIssues: 0, +} as const; + +function makeColonyData(overrides: Partial = {}): ActivityData { + return { + generatedAt: '2026-02-01T00:00:00Z', + repository: BASE_REPO_INFO, + pullRequests: [], + proposals: [], + comments: [], + commits: [], + issues: [], + agents: [], + agentStats: [], + ...overrides, + } as ActivityData; +} + +// ────────────────────────────────────────────── +// percentile +// ────────────────────────────────────────────── + +describe('percentile', () => { + it('returns null when sample is below minSample', () => { + expect(percentile([1, 2, 3, 4], 50)).toBeNull(); // default minSample = 5 + }); + + it('returns median for odd-length array', () => { + expect(percentile([1, 2, 3, 4, 5], 50)).toBe(3); + }); + + it('returns p50 correctly for even-length array', () => { + expect(percentile([1, 2, 3, 4, 5, 6], 50)).toBe(3); + }); + + it('returns p95 for a 20-element array', () => { + const sorted = Array.from({ length: 20 }, (_, i) => i + 1); + expect(percentile(sorted, 95)).toBe(19); + }); + + it('accepts custom minSample', () => { + expect(percentile([10, 20, 30], 50, 3)).toBe(20); + expect(percentile([10, 20], 50, 3)).toBeNull(); + }); +}); + +// ────────────────────────────────────────────── +// computeGini +// ────────────────────────────────────────────── + +describe('computeGini', () => { + it('returns 0 for a single value', () => { + expect(computeGini([5])).toBe(0); + }); + + it('returns 0 for perfectly equal distribution', () => { + expect(computeGini([3, 3, 3])).toBe(0); + }); + + it('returns 1 for maximum concentration (all mass in one)', () => { + // With [0, 0, 6]: Gini approaches 1 as n grows; for n=3 it equals 2/3 + expect(computeGini([0, 0, 6])).toBeCloseTo(2 / 3, 5); + }); + + it('returns 0 for all-zero array', () => { + expect(computeGini([0, 0, 0])).toBe(0); + }); + + it('returns a value between 0 and 1 for mixed distribution', () => { + const g = computeGini([1, 3, 6, 10]); + expect(g).toBeGreaterThan(0); + expect(g).toBeLessThan(1); + }); +}); + +// ────────────────────────────────────────────── +// computeRepoMetrics +// ────────────────────────────────────────────── + +describe('computeRepoMetrics', () => { + const windowStart = new Date('2026-01-01T00:00:00Z'); + const currentEnd = new Date('2026-04-01T00:00:00Z'); // 90 days + + it('returns zero metrics for empty PR list', () => { + const m = computeRepoMetrics([], 'test/repo', windowStart, currentEnd); + expect(m.repository).toBe('test/repo'); + expect(m.mergedPrCount).toBe(0); + expect(m.prCycleTimeP50Hours).toBeNull(); + expect(m.mergedPrsPerWeek).toBe(0); + expect(m.giniCoefficient).toBe(0); + expect(m.uniqueContributorCount).toBe(0); + expect(m.openAtWindowEnd).toBe(0); + }); + + it('counts merged PRs within window only', () => { + const prs = [ + makeGitHubPR({ + merged_at: '2026-01-15T00:00:00Z', + created_at: '2026-01-10T00:00:00Z', + }), + makeGitHubPR({ + number: 2, + merged_at: '2025-12-15T00:00:00Z', // before window + created_at: '2025-12-10T00:00:00Z', + }), + makeGitHubPR({ + number: 3, + merged_at: '2026-04-10T00:00:00Z', // after window + created_at: '2026-04-05T00:00:00Z', + }), + ]; + const m = computeRepoMetrics(prs, 'test/repo', windowStart, currentEnd); + expect(m.mergedPrCount).toBe(1); + }); + + it('computes cycle time from PRs opened before window start', () => { + // PR created before window start but merged inside — should be included in + // mergedPrs and contribute to cycle time + const prs = [ + makeGitHubPR({ + created_at: '2025-12-01T00:00:00Z', // 31 days before window + merged_at: '2026-01-15T00:00:00Z', // within window + }), + ]; + const m = computeRepoMetrics(prs, 'test/repo', windowStart, currentEnd); + expect(m.mergedPrCount).toBe(1); + // cycle = (Jan 15 - Dec 01) = 45 days = 1080 hours; too few for p50 (need 5) + expect(m.prCycleTimeP50Hours).toBeNull(); + }); + + it('computes p50 cycle time when 5+ merged PRs exist', () => { + const prs = [24, 48, 72, 96, 120].map((hours, i) => { + const created = new Date('2026-01-01T00:00:00Z'); + const merged = new Date(created.getTime() + hours * 3600000); + return makeGitHubPR({ + number: i + 1, + created_at: created.toISOString(), + merged_at: merged.toISOString(), + }); + }); + const m = computeRepoMetrics(prs, 'test/repo', windowStart, currentEnd); + expect(m.prCycleTimeP50Hours).toBe(72); // median of [24,48,72,96,120] + }); + + it('counts open PRs at window end using currentEnd as anchor', () => { + const prs = [ + // open PR created before currentEnd → counted + makeGitHubPR({ + number: 10, + state: 'open', + merged_at: null, + closed_at: null, + created_at: '2026-03-01T00:00:00Z', + }), + // open PR created after currentEnd → not counted + makeGitHubPR({ + number: 11, + state: 'open', + merged_at: null, + closed_at: null, + created_at: '2026-04-05T00:00:00Z', + }), + ]; + const m = computeRepoMetrics(prs, 'test/repo', windowStart, currentEnd); + expect(m.openAtWindowEnd).toBe(1); + }); + + it('computes Gini correctly for unequal merge distribution', () => { + const prs = [ + makeGitHubPR({ + number: 1, + user: { login: 'alice' }, + merged_at: '2026-02-01T00:00:00Z', + created_at: '2026-01-30T00:00:00Z', + }), + makeGitHubPR({ + number: 2, + user: { login: 'alice' }, + merged_at: '2026-02-02T00:00:00Z', + created_at: '2026-01-31T00:00:00Z', + }), + makeGitHubPR({ + number: 3, + user: { login: 'bob' }, + merged_at: '2026-02-03T00:00:00Z', + created_at: '2026-02-01T00:00:00Z', + }), + ]; + const m = computeRepoMetrics(prs, 'test/repo', windowStart, currentEnd); + expect(m.uniqueContributorCount).toBe(2); + // alice has 2, bob has 1 → some inequality + expect(m.giniCoefficient).toBeGreaterThan(0); + expect(m.giniCoefficient).toBeLessThan(1); + }); +}); + +// ────────────────────────────────────────────── +// computeColonyMetrics +// ────────────────────────────────────────────── + +describe('computeColonyMetrics', () => { + const windowStart = new Date('2026-01-01T00:00:00Z'); + const currentEnd = new Date('2026-04-01T00:00:00Z'); + + it('uses repository from data.repository', () => { + const data = makeColonyData({ + repository: { ...BASE_REPO_INFO, owner: 'myorg', name: 'myrepo' }, + }); + const m = computeColonyMetrics(data, windowStart, currentEnd); + expect(m.repository).toBe('myorg/myrepo'); + }); + + it('counts open PRs at window end using currentEnd, not latest createdAt', () => { + // This tests the stale-currentEnd bug: if we accidentally used the latest + // PR's createdAt as the anchor instead of the generation time, a PR opened + // after the latest PR but before currentEnd would be missed. + const data = makeColonyData({ + pullRequests: [ + { + number: 1, + title: 'merged', + state: 'merged', + author: 'hivemoot-builder', + createdAt: '2026-02-01T00:00:00Z', + mergedAt: '2026-02-02T00:00:00Z', + }, + { + // Open PR created after the merged PR's createdAt — should still + // be counted when currentEnd is used as the anchor + number: 2, + title: 'open after latest PR', + state: 'open', + author: 'hivemoot-nurse', + createdAt: '2026-03-01T00:00:00Z', + }, + ], + }); + const m = computeColonyMetrics(data, windowStart, currentEnd); + expect(m.openAtWindowEnd).toBe(1); + }); + + it('excludes merged PRs outside the window', () => { + const data = makeColonyData({ + pullRequests: [ + { + number: 1, + title: 'old merged PR', + state: 'merged', + author: 'hivemoot-builder', + createdAt: '2025-11-01T00:00:00Z', + mergedAt: '2025-11-15T00:00:00Z', // before window + }, + ], + }); + const m = computeColonyMetrics(data, windowStart, currentEnd); + expect(m.mergedPrCount).toBe(0); + }); +}); + +// ────────────────────────────────────────────── +// resolveCohortRepos +// ────────────────────────────────────────────── + +describe('resolveCohortRepos', () => { + it('returns default cohort when env var is unset', () => { + const repos = resolveCohortRepos({}); + expect(repos.length).toBeGreaterThan(0); + for (const r of repos) { + expect(r).toMatch(/^[^/]+\/[^/]+$/); + } + }); + + it('parses comma-separated repos from env', () => { + const repos = resolveCohortRepos({ + BENCHMARK_REPOSITORIES: 'facebook/react,vercel/next.js', + }); + expect(repos).toEqual(['facebook/react', 'vercel/next.js']); + }); + + it('filters invalid entries', () => { + const repos = resolveCohortRepos({ + BENCHMARK_REPOSITORIES: 'facebook/react,not-a-repo,vercel/next.js', + }); + expect(repos).toEqual(['facebook/react', 'vercel/next.js']); + }); + + it('returns default cohort when all entries are invalid', () => { + const repos = resolveCohortRepos({ + BENCHMARK_REPOSITORIES: 'invalid,also-invalid', + }); + expect(repos.length).toBeGreaterThan(0); + }); +}); + +// ────────────────────────────────────────────── +// resolveWindowDays +// ────────────────────────────────────────────── + +describe('resolveWindowDays', () => { + it('defaults to 90 when unset', () => { + expect(resolveWindowDays({})).toBe(90); + }); + + it('parses numeric env var', () => { + expect(resolveWindowDays({ BENCHMARK_WINDOW_DAYS: '60' })).toBe(60); + }); + + it('falls back to 90 for invalid values', () => { + expect(resolveWindowDays({ BENCHMARK_WINDOW_DAYS: 'abc' })).toBe(90); + expect(resolveWindowDays({ BENCHMARK_WINDOW_DAYS: '-10' })).toBe(90); + expect(resolveWindowDays({ BENCHMARK_WINDOW_DAYS: '0' })).toBe(90); + }); +}); + +// ────────────────────────────────────────────── +// buildBenchmarkArtifact (unit — no network) +// ────────────────────────────────────────────── + +describe('buildBenchmarkArtifact', () => { + it('produces a valid artifact with empty cohort', async () => { + const data = makeColonyData(); + const generatedAt = '2026-04-01T00:00:00Z'; + const artifact = await buildBenchmarkArtifact(data, [], 90, generatedAt); + + expect(artifact.generatedAt).toBe(generatedAt); + expect(artifact.windowDays).toBe(90); + expect(artifact.colony.repository).toBe('hivemoot/colony'); + expect(artifact.cohort).toEqual([]); + expect(artifact.methodology).toBe('docs/BENCHMARK-METHODOLOGY.md'); + expect(artifact.limitations.length).toBeGreaterThan(0); + }); + + it('uses generatedAt as the currentEnd anchor', async () => { + // If the bug existed, openAtWindowEnd would use the latest PR's createdAt + // instead of generatedAt — causing recently-opened PRs to be missed. + const generatedAt = '2026-04-01T00:00:00Z'; + const data = makeColonyData({ + pullRequests: [ + { + number: 1, + title: 'open PR', + state: 'open', + author: 'hivemoot-builder', + // Created 10 days before generatedAt — should be in openAtWindowEnd + createdAt: '2026-03-22T00:00:00Z', + }, + ], + }); + const artifact = await buildBenchmarkArtifact(data, [], 90, generatedAt); + expect(artifact.colony.openAtWindowEnd).toBe(1); + }); +}); diff --git a/web/scripts/generate-benchmark.ts b/web/scripts/generate-benchmark.ts new file mode 100644 index 00000000..de4e7c41 --- /dev/null +++ b/web/scripts/generate-benchmark.ts @@ -0,0 +1,512 @@ +/** + * Benchmark artifact generator — CLI script. + * + * Compares Colony PR velocity metrics against an external OSS cohort using + * public GitHub API data. Outputs public/data/benchmark.json. + * + * Usage: + * npm run generate-benchmark + * BENCHMARK_REPOSITORIES=vitejs/vite,prettier/prettier,sindresorhus/got \ + * npm run generate-benchmark + * + * Environment variables: + * BENCHMARK_REPOSITORIES Comma-separated "owner/repo" list of comparison + * repos. Defaults to DEFAULT_COHORT below. + * BENCHMARK_WINDOW_DAYS Rolling window in days (default: 90). + * ACTIVITY_FILE Path to Colony's activity.json. Defaults to the + * generated artifact in public/data/activity.json. + * GITHUB_TOKEN / GH_TOKEN GitHub personal access token for higher API + * rate limits. Unauthenticated requests are limited + * to 60/hour; authenticated to 5 000/hour. + * + * Methodology: docs/BENCHMARK-METHODOLOGY.md + */ + +import { existsSync, mkdirSync, readFileSync, writeFileSync } from 'node:fs'; +import { dirname, join, resolve } from 'node:path'; +import { fileURLToPath } from 'node:url'; +import type { ActivityData, PullRequest } from '../shared/types'; + +const __dirname = dirname(fileURLToPath(import.meta.url)); +const DEFAULT_ACTIVITY_FILE = join( + __dirname, + '..', + 'public', + 'data', + 'activity.json' +); +const BENCHMARK_FILE = join( + __dirname, + '..', + 'public', + 'data', + 'benchmark.json' +); + +const GITHUB_API = 'https://api.github.com'; + +/** + * Default external comparison cohort. + * + * Selected criteria (see docs/BENCHMARK-METHODOLOGY.md): + * - Active: ≥5 merged PRs in the default 90-day window + * - Moderate size: comparable PR volume to Colony + * - PR-centric workflow: uses pull requests as the primary merge gate + * - Publicly accessible GitHub repository + * + * To substitute or extend the cohort, set BENCHMARK_REPOSITORIES. + */ +const DEFAULT_COHORT = ['vitejs/vite', 'prettier/prettier', 'sigstore/cosign']; + +/** + * Extra days added to the post-fetch filter cutoff. + * + * After fetching up to 200 closed PRs (recency-ordered) from GitHub, the + * results are filtered to PRs created on or after + * `currentEnd - (windowDays + PAGING_LOOKBACK_BUFFER_DAYS)`. This retains + * PRs opened *before* the window start but merged *within* it (common for + * long-running feature branches). + * + * Note: this is a post-fetch filter, not parameterized API paging. For repos + * with more than 200 closed PRs within the extended range, metrics cover only + * the most recently created 200 closed PRs. + */ +const PAGING_LOOKBACK_BUFFER_DAYS = 90; + +// ────────────────────────────────────────────── +// GitHub API helpers +// ────────────────────────────────────────────── + +interface GitHubPR { + number: number; + state: string; + draft: boolean; + user: { login: string }; + created_at: string; + closed_at: string | null; + merged_at: string | null; +} + +async function fetchJson(endpoint: string): Promise { + const url = `${GITHUB_API}${endpoint}`; + const token = process.env.GITHUB_TOKEN ?? process.env.GH_TOKEN; + const headers: Record = { + Accept: 'application/vnd.github.v3+json', + 'User-Agent': 'colony-benchmark-generator', + }; + if (token) headers.Authorization = `Bearer ${token}`; + + const response = await fetch(url, { headers }); + if (!response.ok) { + throw new Error( + `GitHub API error: ${response.status} ${response.statusText} for ${endpoint}` + ); + } + return response.json() as Promise; +} + +/** + * Fetch the most recent `pages` pages of closed+open PRs for a repo. + * Uses two pages of closed PRs (100 each) plus the first page of open PRs to + * get a representative sample of recent activity. + */ +async function fetchRepoPRs( + owner: string, + repo: string, + pages: number = 2 +): Promise { + const closedPages = await Promise.all( + Array.from({ length: pages }, (_, i) => + fetchJson( + `/repos/${owner}/${repo}/pulls?state=closed&per_page=100&page=${i + 1}` + ) + ) + ); + + const openPRs = await fetchJson( + `/repos/${owner}/${repo}/pulls?state=open&per_page=100&page=1` + ); + + return [...openPRs, ...closedPages.flat()]; +} + +// ────────────────────────────────────────────── +// Types +// ────────────────────────────────────────────── + +export interface RepoMetrics { + /** "owner/repo" */ + repository: string; + /** Median PR cycle time in hours (open → merge), or null if < 5 samples */ + prCycleTimeP50Hours: number | null; + /** PRs merged per week within the measurement window */ + mergedPrsPerWeek: number; + /** Gini coefficient of per-contributor PR merge counts (0=equal, 1=concentrated) */ + giniCoefficient: number; + /** Number of merged PRs used for cycle time computation */ + mergedPrCount: number; + /** Unique contributors who merged at least one PR in the window */ + uniqueContributorCount: number; + /** PRs that were open at the end of the measurement window */ + openAtWindowEnd: number; +} + +export interface BenchmarkArtifact { + /** ISO timestamp of when this artifact was generated */ + generatedAt: string; + /** Rolling window used for all metrics */ + windowDays: number; + /** Colony's own metrics for the same window */ + colony: RepoMetrics; + /** External comparison repos */ + cohort: RepoMetrics[]; + /** Pointer to the human-readable methodology doc */ + methodology: string; + /** Explicit limitations that consumers must understand */ + limitations: string[]; +} + +// ────────────────────────────────────────────── +// Metric computation (pure functions, exportable for testing) +// ────────────────────────────────────────────── + +/** + * Compute the p-th percentile of a pre-sorted ascending array. + * Returns null for arrays shorter than MIN_SAMPLE. + */ +export function percentile( + sorted: number[], + p: number, + minSample: number = 5 +): number | null { + if (sorted.length < minSample) return null; + const index = Math.ceil((p / 100) * sorted.length) - 1; + return sorted[Math.max(0, index)]; +} + +/** + * Compute the Gini coefficient of an array of non-negative values. + * Returns 0 for arrays of length ≤ 1 or all-zero arrays. + */ +export function computeGini(values: number[]): number { + if (values.length <= 1) return 0; + const sorted = [...values].sort((a, b) => a - b); + const n = sorted.length; + const total = sorted.reduce((a, b) => a + b, 0); + if (total === 0) return 0; + let sumOfDiffs = 0; + for (let i = 0; i < n; i++) { + sumOfDiffs += (2 * (i + 1) - n - 1) * sorted[i]; + } + return sumOfDiffs / (n * total); +} + +/** + * Derive RepoMetrics from a flat list of GitHub PR objects. + * + * @param prs Raw PR list for the repo (open + closed pages) + * @param repository "owner/repo" identifier + * @param windowStart Window start (inclusive) — PRs merged before this are excluded + * @param currentEnd Anchor for "open at window end" — should be the generation + * timestamp so recently opened PRs are not silently excluded + */ +export function computeRepoMetrics( + prs: GitHubPR[], + repository: string, + windowStart: Date, + currentEnd: Date +): RepoMetrics { + const windowMs = currentEnd.getTime() - windowStart.getTime(); + const windowWeeks = windowMs / (1000 * 60 * 60 * 24 * 7); + + // PRs merged within the window + const mergedPrs = prs.filter((pr) => { + if (!pr.merged_at) return false; + const mergedAt = new Date(pr.merged_at).getTime(); + return ( + mergedAt >= windowStart.getTime() && mergedAt <= currentEnd.getTime() + ); + }); + + // Cycle time: open → merge, in hours + // merged_at is guaranteed non-null here because mergedPrs filtered for it above + const cycleTimes = mergedPrs + .map((pr) => { + const openMs = new Date(pr.created_at).getTime(); + const mergeMs = new Date(pr.merged_at ?? '').getTime(); + return (mergeMs - openMs) / (1000 * 60 * 60); + }) + .filter((h) => h >= 0) + .sort((a, b) => a - b); + + // Per-contributor merge counts for Gini coefficient + const mergesByContributor = new Map(); + for (const pr of mergedPrs) { + const login = pr.user.login; + mergesByContributor.set(login, (mergesByContributor.get(login) ?? 0) + 1); + } + + // PRs open at the end of the window (non-merged, created before currentEnd) + const openAtWindowEnd = prs.filter((pr) => { + if (pr.merged_at !== null) return false; + if (pr.state !== 'open') return false; + const createdAt = new Date(pr.created_at).getTime(); + return createdAt <= currentEnd.getTime(); + }).length; + + return { + repository, + prCycleTimeP50Hours: percentile(cycleTimes, 50), + mergedPrsPerWeek: + windowWeeks > 0 + ? parseFloat((mergedPrs.length / windowWeeks).toFixed(2)) + : 0, + giniCoefficient: parseFloat( + computeGini([...mergesByContributor.values()]).toFixed(3) + ), + mergedPrCount: mergedPrs.length, + uniqueContributorCount: mergesByContributor.size, + openAtWindowEnd, + }; +} + +/** + * Derive RepoMetrics from Colony's own ActivityData. + * Colony uses a richer data model; this maps it to the same shape as external + * repos to ensure a fair apples-to-apples comparison. + */ +export function computeColonyMetrics( + data: ActivityData, + windowStart: Date, + currentEnd: Date +): RepoMetrics { + const windowMs = currentEnd.getTime() - windowStart.getTime(); + const windowWeeks = windowMs / (1000 * 60 * 60 * 24 * 7); + + const mergedPrs: PullRequest[] = data.pullRequests.filter((pr) => { + if (pr.state !== 'merged' || !pr.mergedAt) return false; + const mergedAt = new Date(pr.mergedAt).getTime(); + return ( + mergedAt >= windowStart.getTime() && mergedAt <= currentEnd.getTime() + ); + }); + + const cycleTimes = mergedPrs + .map((pr) => { + const openMs = new Date(pr.createdAt).getTime(); + // mergedAt is guaranteed non-null because mergedPrs filtered for it above + const mergeMs = new Date(pr.mergedAt ?? '').getTime(); + return (mergeMs - openMs) / (1000 * 60 * 60); + }) + .filter((h) => h >= 0) + .sort((a, b) => a - b); + + const mergesByContributor = new Map(); + for (const pr of mergedPrs) { + mergesByContributor.set( + pr.author, + (mergesByContributor.get(pr.author) ?? 0) + 1 + ); + } + + const openAtWindowEnd = data.pullRequests.filter((pr) => { + if (pr.state !== 'open') return false; + const createdAt = new Date(pr.createdAt).getTime(); + return createdAt <= currentEnd.getTime(); + }).length; + + const colonyRepo = data.repository + ? `${data.repository.owner}/${data.repository.name}` + : 'hivemoot/colony'; + + return { + repository: colonyRepo, + prCycleTimeP50Hours: percentile(cycleTimes, 50), + mergedPrsPerWeek: + windowWeeks > 0 + ? parseFloat((mergedPrs.length / windowWeeks).toFixed(2)) + : 0, + giniCoefficient: parseFloat( + computeGini([...mergesByContributor.values()]).toFixed(3) + ), + mergedPrCount: mergedPrs.length, + uniqueContributorCount: mergesByContributor.size, + openAtWindowEnd, + }; +} + +// ────────────────────────────────────────────── +// Artifact assembly +// ────────────────────────────────────────────── + +export async function buildBenchmarkArtifact( + colonyData: ActivityData, + cohortRepos: string[], + windowDays: number, + generatedAt: string +): Promise { + const currentEnd = new Date(generatedAt); + const windowStart = new Date( + currentEnd.getTime() - windowDays * 24 * 60 * 60 * 1000 + ); + const fetchStart = new Date( + currentEnd.getTime() - + (windowDays + PAGING_LOOKBACK_BUFFER_DAYS) * 24 * 60 * 60 * 1000 + ); + + // Colony metrics derived from local activity.json + const colony = computeColonyMetrics(colonyData, windowStart, currentEnd); + + // External cohort — fetched from GitHub API + const cohort: RepoMetrics[] = []; + for (const repoSlug of cohortRepos) { + const [owner, repo] = repoSlug.split('/'); + if (!owner || !repo) { + console.warn(` Skipping invalid repo slug: ${repoSlug}`); + continue; + } + + console.log(` Fetching ${repoSlug}...`); + try { + const prs = await fetchRepoPRs(owner, repo); + + // Filter to PRs created on or after the extended fetch start date + const recentPrs = prs.filter( + (pr) => new Date(pr.created_at).getTime() >= fetchStart.getTime() + ); + + const metrics = computeRepoMetrics( + recentPrs, + repoSlug, + windowStart, + currentEnd + ); + if (metrics.prCycleTimeP50Hours === null) { + console.warn( + ` Warning: ${repoSlug} has fewer than 5 merged PRs in the ` + + `${windowDays}-day window (found ${metrics.mergedPrCount}). ` + + `prCycleTimeP50Hours will be null. Consider replacing this repo ` + + `in the cohort or using a longer BENCHMARK_WINDOW_DAYS.` + ); + } + cohort.push(metrics); + } catch (err) { + console.warn( + ` Warning: failed to fetch ${repoSlug}: ${String(err)}. Skipping.` + ); + } + } + + return { + generatedAt, + windowDays, + colony, + cohort, + methodology: 'docs/BENCHMARK-METHODOLOGY.md', + limitations: [ + 'Colony uses autonomous agents with no human review latency, no timezone coordination overhead, and no meeting/async-communication delays. PR cycle times are structurally lower for Colony than for human-staffed projects.', + 'External cohort repos were selected for comparable size (PR volume, contributor count) within the measurement window, not for governance model similarity. The comparison is directionally useful, not causally conclusive.', + 'GitHub API results use recency-ordered pagination (100 PRs/page, 2 pages for closed PRs). Long-running repos with high PR volume may have activity outside this window that affects baseline metrics.', + 'Gini coefficient measures merge concentration among contributors. Colony agents each have a designated role; contributor "concentration" has a different meaning than in community open-source projects.', + ], + }; +} + +// ────────────────────────────────────────────── +// CLI entry point +// ────────────────────────────────────────────── + +function parseOwnerRepo(slug: string): { owner: string; repo: string } | null { + const parts = slug.trim().split('/'); + if (parts.length !== 2 || !parts[0] || !parts[1]) return null; + return { owner: parts[0], repo: parts[1] }; +} + +export function resolveCohortRepos( + env: NodeJS.ProcessEnv = process.env +): string[] { + const raw = env.BENCHMARK_REPOSITORIES; + if (!raw) return DEFAULT_COHORT; + const parsed = raw + .split(',') + .map((s) => s.trim()) + .filter(Boolean); + const valid = parsed.filter((s) => parseOwnerRepo(s) !== null); + if (valid.length === 0) { + console.warn( + 'BENCHMARK_REPOSITORIES contained no valid "owner/repo" entries. Using default cohort.' + ); + return DEFAULT_COHORT; + } + return valid; +} + +export function resolveWindowDays( + env: NodeJS.ProcessEnv = process.env +): number { + const raw = Number(env.BENCHMARK_WINDOW_DAYS); + return Number.isFinite(raw) && raw > 0 ? Math.floor(raw) : 90; +} + +function resolveActivityFile(env: NodeJS.ProcessEnv = process.env): string { + return env.ACTIVITY_FILE ?? DEFAULT_ACTIVITY_FILE; +} + +function isDirectExecution(): boolean { + if (!process.argv[1]) return false; + return resolve(process.argv[1]) === resolve(fileURLToPath(import.meta.url)); +} + +async function main(): Promise { + const activityFile = resolveActivityFile(); + + if (!existsSync(activityFile)) { + console.error(`Activity file not found: ${activityFile}`); + console.error( + 'Run `npm run generate-data` first, or set ACTIVITY_FILE env var.' + ); + process.exit(1); + } + + const colonyData = JSON.parse( + readFileSync(activityFile, 'utf-8') + ) as ActivityData; + + const cohortRepos = resolveCohortRepos(); + const windowDays = resolveWindowDays(); + const generatedAt = new Date().toISOString(); + + console.log(`Generating benchmark artifact`); + console.log(` Window: ${windowDays} days`); + console.log(` Cohort: ${cohortRepos.join(', ')}`); + console.log(` Colony data: ${activityFile}`); + console.log(''); + + const artifact = await buildBenchmarkArtifact( + colonyData, + cohortRepos, + windowDays, + generatedAt + ); + + mkdirSync(dirname(BENCHMARK_FILE), { recursive: true }); + writeFileSync(BENCHMARK_FILE, JSON.stringify(artifact, null, 2), 'utf-8'); + + console.log(''); + console.log(`Benchmark artifact written to: ${BENCHMARK_FILE}`); + console.log(` Colony: ${artifact.colony.mergedPrCount} merged PRs`); + console.log(` Cohort: ${artifact.cohort.length} repos`); + for (const repo of artifact.cohort) { + console.log( + ` ${repo.repository}: ${repo.mergedPrCount} merged PRs, ` + + `p50 cycle ${repo.prCycleTimeP50Hours !== null ? `${repo.prCycleTimeP50Hours.toFixed(1)}h` : 'N/A'}` + ); + } +} + +if (isDirectExecution()) { + main().catch((err: unknown) => { + console.error('Fatal error:', err); + process.exit(1); + }); +}