Skip to content

diannt/engineering-impact-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Engineering Impact Dashboard

Deep analytical profiling of engineering contributors for any GitHub repository. Quantifies impact using six behavioral traits grounded in peer-reviewed research, assigns persona archetypes, and visualizes collaboration through a knowledge graph.

Live demo: https://posthog-eng-impact-dashboard.vercel.app

Motivation

Traditional engineering metrics — lines of code, commit counts, PR throughput — reward volume over value. A developer who ships 500 lines of throwaway code looks more productive than one who writes 50 lines that survive for years. Teams relying on these numbers make poor staffing decisions, misidentify bottlenecks, and lose their quiet force-multipliers.

This project replaces vanity metrics with behavioral signals that answer the questions engineering leads actually care about: whose code sticks around after the first week? Who makes everyone else's code better through reviews? Who holds institutional knowledge across system boundaries? Run the pipeline against your own repository before a reorg, a performance cycle, or a hiring plan — and see where impact actually lives, backed by the same research (DORA/Accelerate, GitClear, Bosu et al.) used by teams at Google, Microsoft, and Spotify to measure what matters.

Architecture

pipeline/                          web/
┌─────────────┐                    ┌──────────────────────┐
│ extract.py  │──→ raw_data/       │ Next.js 16 (App      │
│  (GitHub    │    extracted.json   │ Router, SSR)         │
│   GraphQL)  │                    │                      │
├─────────────┤                    │ Components:          │
│ sanitize.py │──→ cache/          │  Dashboard           │
│  (bot       │    reviewer_       │  KnowledgeGraph      │
│   detection)│    classifications │  ContributorList     │
├─────────────┤                    │  ProfileModal        │
│ analyze.py  │──→ computed/       │  ResearchModal       │
│  (6 traits, │    traits.json     │  MiniRadar/FullRadar │
│   K-means)  │                    │  StatsBar            │
├─────────────┤                    └──────────┬───────────┘
│ refine_     │──→ traits.json                │
│ personas.py │    (in-place)                 │
├─────────────┤                               │
│ graph.py    │──→ web/public/data/ ──────────┘
│  (nodes +   │    analysis.json
│   edges)    │    (read at build time)
└─────────────┘

Pipeline scripts (run in order)

Script Purpose Output
extract.py Fetches git log, PR reviews, issues, profiles via GitHub GraphQL/REST raw_data/extracted.json (~4 MB)
sanitize.py 4-layer bot detection via GitHub API (account type, bio regex, /apps/, ghost) cache/reviewer_classifications.json
analyze.py Computes 6 traits per contributor with git blame sampling, PageRank, Shannon entropy computed/traits.json
refine_personas.py Two-pass persona correction: hard rules + soft signature scoring Updates traits.json in-place
graph.py Builds knowledge graph: co-authorship (0.40) + reviews (0.35) + Jaccard files (0.25) web/public/data/analysis.json

Six behavioral traits

Trait Method Research basis
Code Survivability 14-day churn window via git blame GitClear (2024), 211M lines
Collaboration Index Weighted composite: reviews + co-authors + cross-scope + issues Bosu et al. (2015), MSR/IEEE
System Breadth Shannon entropy H(D)/H_max over directory domains Shannon (1948)
Focus Depth Gini coefficient of commit distribution Vasa et al. (2009), IEEE
Review Influence PageRank (d=0.85) on reviewer→author graph Brin & Page (1998)
Velocity Consistency 1 - CV(weekly_commits) Forsgren et al. (2018), DORA/Accelerate

Five persona archetypes

  • The Architect — high breadth + high survivability + feature-dominant
  • The Mentor — high review influence + high collaboration
  • The Firefighter — bursty velocity + fix-dominant
  • The Solo Maker — high focus + low collaboration + low review
  • The Operator — high velocity + chore/CI significant

Frontend stack

  • Next.js 16 with App Router (server-side rendering)
  • React 19, TypeScript
  • recharts (radar charts with percentage tooltips)
  • react-force-graph-2d (knowledge graph with persona-colored nodes)
  • Tailwind CSS v4 (dark theme)

Prerequisites

  • Python 3.10+
  • Node.js 22+ with pnpm
  • GitHub personal access token with repo scope

Local setup

1. Pipeline

cd pipeline

# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Set GitHub token
export GITHUB_TOKEN=ghp_your_token_here

# Run extraction (slow — GitHub API calls + git log parsing)
python extract.py

# Sanitize bot accounts
python sanitize.py

# Compute traits (slow — git blame sampling)
python analyze.py

# Refine persona assignments
python refine_personas.py

# Build knowledge graph → web/public/data/analysis.json
python graph.py

2. Frontend

cd web

# Install dependencies
pnpm install

# Development server
pnpm dev
# → http://localhost:3000

# Production build
pnpm build
pnpm start

3. Deploy to Vercel

cd web

# Link project (one-time)
vercel link --project posthog-eng-impact-dashboard

# Build locally and deploy (avoids uploading monorepo)
vercel build --prod
vercel deploy --prebuilt --prod

Project structure

eng-impact-dashboard/
├── pipeline/
│   ├── extract.py                 # GitHub data extraction
│   ├── sanitize.py                # Bot detection & classification
│   ├── analyze.py                 # 6-trait computation engine
│   ├── refine_personas.py         # Persona correction pass
│   ├── graph.py                   # Knowledge graph builder
│   ├── requirements.txt           # Python dependencies
│   ├── cache/                     # API response cache (gitignored)
│   ├── computed/                  # Trait results (gitignored)
│   └── raw_data/                  # Extracted data (gitignored)
├── web/
│   ├── public/data/analysis.json  # Final output consumed by frontend
│   ├── src/
│   │   ├── app/                   # Next.js pages + layout
│   │   ├── components/            # React components
│   │   └── types.ts               # TypeScript interfaces
    ├── next.config.js
    ├── package.json
    └── tsconfig.json

About

Deep analytical profiling of engineering contributors — 6 research-backed behavioral traits, persona archetypes, and knowledge graph visualization

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors