Deep analytical profiling of engineering contributors for any GitHub repository. Quantifies impact using six behavioral traits grounded in peer-reviewed research, assigns persona archetypes, and visualizes collaboration through a knowledge graph.
Live demo: https://posthog-eng-impact-dashboard.vercel.app
Traditional engineering metrics — lines of code, commit counts, PR throughput — reward volume over value. A developer who ships 500 lines of throwaway code looks more productive than one who writes 50 lines that survive for years. Teams relying on these numbers make poor staffing decisions, misidentify bottlenecks, and lose their quiet force-multipliers.
This project replaces vanity metrics with behavioral signals that answer the questions engineering leads actually care about: whose code sticks around after the first week? Who makes everyone else's code better through reviews? Who holds institutional knowledge across system boundaries? Run the pipeline against your own repository before a reorg, a performance cycle, or a hiring plan — and see where impact actually lives, backed by the same research (DORA/Accelerate, GitClear, Bosu et al.) used by teams at Google, Microsoft, and Spotify to measure what matters.
pipeline/ web/
┌─────────────┐ ┌──────────────────────┐
│ extract.py │──→ raw_data/ │ Next.js 16 (App │
│ (GitHub │ extracted.json │ Router, SSR) │
│ GraphQL) │ │ │
├─────────────┤ │ Components: │
│ sanitize.py │──→ cache/ │ Dashboard │
│ (bot │ reviewer_ │ KnowledgeGraph │
│ detection)│ classifications │ ContributorList │
├─────────────┤ │ ProfileModal │
│ analyze.py │──→ computed/ │ ResearchModal │
│ (6 traits, │ traits.json │ MiniRadar/FullRadar │
│ K-means) │ │ StatsBar │
├─────────────┤ └──────────┬───────────┘
│ refine_ │──→ traits.json │
│ personas.py │ (in-place) │
├─────────────┤ │
│ graph.py │──→ web/public/data/ ──────────┘
│ (nodes + │ analysis.json
│ edges) │ (read at build time)
└─────────────┘
| Script | Purpose | Output |
|---|---|---|
extract.py |
Fetches git log, PR reviews, issues, profiles via GitHub GraphQL/REST | raw_data/extracted.json (~4 MB) |
sanitize.py |
4-layer bot detection via GitHub API (account type, bio regex, /apps/, ghost) | cache/reviewer_classifications.json |
analyze.py |
Computes 6 traits per contributor with git blame sampling, PageRank, Shannon entropy | computed/traits.json |
refine_personas.py |
Two-pass persona correction: hard rules + soft signature scoring | Updates traits.json in-place |
graph.py |
Builds knowledge graph: co-authorship (0.40) + reviews (0.35) + Jaccard files (0.25) | web/public/data/analysis.json |
| Trait | Method | Research basis |
|---|---|---|
| Code Survivability | 14-day churn window via git blame | GitClear (2024), 211M lines |
| Collaboration Index | Weighted composite: reviews + co-authors + cross-scope + issues | Bosu et al. (2015), MSR/IEEE |
| System Breadth | Shannon entropy H(D)/H_max over directory domains | Shannon (1948) |
| Focus Depth | Gini coefficient of commit distribution | Vasa et al. (2009), IEEE |
| Review Influence | PageRank (d=0.85) on reviewer→author graph | Brin & Page (1998) |
| Velocity Consistency | 1 - CV(weekly_commits) | Forsgren et al. (2018), DORA/Accelerate |
- The Architect — high breadth + high survivability + feature-dominant
- The Mentor — high review influence + high collaboration
- The Firefighter — bursty velocity + fix-dominant
- The Solo Maker — high focus + low collaboration + low review
- The Operator — high velocity + chore/CI significant
- Next.js 16 with App Router (server-side rendering)
- React 19, TypeScript
- recharts (radar charts with percentage tooltips)
- react-force-graph-2d (knowledge graph with persona-colored nodes)
- Tailwind CSS v4 (dark theme)
- Python 3.10+
- Node.js 22+ with pnpm
- GitHub personal access token with
reposcope
cd pipeline
# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# Set GitHub token
export GITHUB_TOKEN=ghp_your_token_here
# Run extraction (slow — GitHub API calls + git log parsing)
python extract.py
# Sanitize bot accounts
python sanitize.py
# Compute traits (slow — git blame sampling)
python analyze.py
# Refine persona assignments
python refine_personas.py
# Build knowledge graph → web/public/data/analysis.json
python graph.pycd web
# Install dependencies
pnpm install
# Development server
pnpm dev
# → http://localhost:3000
# Production build
pnpm build
pnpm startcd web
# Link project (one-time)
vercel link --project posthog-eng-impact-dashboard
# Build locally and deploy (avoids uploading monorepo)
vercel build --prod
vercel deploy --prebuilt --prodeng-impact-dashboard/
├── pipeline/
│ ├── extract.py # GitHub data extraction
│ ├── sanitize.py # Bot detection & classification
│ ├── analyze.py # 6-trait computation engine
│ ├── refine_personas.py # Persona correction pass
│ ├── graph.py # Knowledge graph builder
│ ├── requirements.txt # Python dependencies
│ ├── cache/ # API response cache (gitignored)
│ ├── computed/ # Trait results (gitignored)
│ └── raw_data/ # Extracted data (gitignored)
├── web/
│ ├── public/data/analysis.json # Final output consumed by frontend
│ ├── src/
│ │ ├── app/ # Next.js pages + layout
│ │ ├── components/ # React components
│ │ └── types.ts # TypeScript interfaces
├── next.config.js
├── package.json
└── tsconfig.json