A CLI tool for measuring AI coding proficiency based on context engineering artifacts.
Adoption ≠ proficiency. Some developers save 10+ hours a week with AI. Others get slower with the same tools.
How do you measure actual AI proficiency?
Measure context engineering. Look at whether teams are creating files like CLAUDE.md, .cursorrules, .github/copilot-instructions.md, and AGENTS.md. These artifacts indicate that someone has moved beyond treating AI as fancy autocomplete and started deliberately shaping how AI understands their work.
This tool scans repositories for context engineering artifacts and calculates a maturity score based on an 8-level model aligned with Steve Yegge's stages.
Since v0.6.0 it also detects 2026 context-engineering signals — verification, hooks, eval loops, telemetry, anti-drift maintenance hygiene, dynamic-workflow orchestration, plugins, and harness engineering — and uses them to gate Levels 6-8 (file coverage alone is no longer enough). It also flags bloated always-on files (a permanent context tax). See docs/SIGNALS.md and docs/AGENTS_FILE_GUIDANCE.md (how to write concise, behavioral CLAUDE.md/AGENTS.md). All detection is grounded in official documentation.
Since v0.7.0 it can also prove what your context actually does, not just what exists: measure-ai-proficiency --prove runs a report-only efficacy pass (do documented commands resolve? are hooks wired? what's the always-on token budget?) and reports a separate Efficacy Score with reproducible green/red evidence — resolve-only by default; --prove-exec runs commands/hooks in a hardened sandbox (local repos only). See docs/EFFICACY.md.
# Install from PyPI
pip install measure-ai-proficiency
# Run on any repository
cd /path/to/your-project
measure-ai-proficiencyThat's it! The tool scans for files like CLAUDE.md, .cursorrules, .github/copilot-instructions.md, and calculates a maturity score.
NEW: Make your AI assistant aware of its own proficiency! The MCP server brings real-time AI context analysis directly into Claude Code, Cursor, and other MCP-compatible tools.
# 1. Install (if not already installed)
pip install measure-ai-proficiency
# 2. Add to your .mcp.json
{
"mcpServers": {
"measure-ai-proficiency": {
"command": "measure-ai-proficiency-mcp"
}
}
}
# 3. Restart Claude CodeAsk your AI assistant:
- "What's my AI proficiency level?" → Real-time analysis
- "Are my cross-references valid?" → Validate CLAUDE.md links
- "How can I reach Level 6?" → Get specific recommendations
- "Scan all repos in my org" → Organization-wide assessment
scan_current_repo- Analyze current repositoryget_recommendations- Get improvement suggestionscheck_cross_references- Validate AI context file linksget_level_requirements- Show next level requirementsscan_github_repo- Analyze remote GitHub reposcan_github_org- Scan entire GitHub organizationvalidate_file_quality- Check specific file qualitycheck_harness_orchestration_quality- 2026 harness/orchestration maturity + L6-L8 gatesscan_for_maintenance_hygiene- Anti-drift maintenance hygieneget_dynamic_workflow_recommendations- Adopt Dynamic Workflows + verificationcurricula_alignment- Official learning on-ramp referencescheapest_primitive_decision_tree_report- Primitive decision disciplineprove_efficacy- Prove the repo's artifacts work (Efficacy Score; resolve-only or sandboxed)
📖 Full MCP Documentation - Setup, examples, troubleshooting
Why use MCP? Creates a meta-improvement loop where the tool that measures AI proficiency becomes AI-accessible, enabling real-time feedback and guided improvements.
============================================================
AI Proficiency Report: my-project
============================================================
Overall Level: Level 5: Multi-Agent Ready
Overall Score: 59.8/100
AI Tools: Claude Code, GitHub Copilot
Level Breakdown:
✓ Level 1: Zero AI [████████████████████] 100.0%
✓ Level 2: Basic Instructions [█████████████░░░░░░░] 66.7%
✓ Level 3: Comprehensive [███|░░░░░░░░░░░░░░░░] 19.0%/15% ✓
✓ Level 4: Skills & Automation[██|░░░░░░░░░░░░░░░░░] 14.5%/12% ✓
✓ Level 5: Multi-Agent Ready [██|░░░░░░░░░░░░░░░░░] 13.9%/10% ✓
○ Level 6: Fleet Infrastructure [░░░░░░░░░░░░░░░░░░] 0.0%/8% needs +8%
Validation Warnings:
📋 TEMPLATE: CLAUDE.md contains template markers (your-org-name)
🪶 BLOAT: CLAUDE.md is 2100 words (> 1500) — large always-on files are a
permanent context tax; move detail into scoped files/skills
Penalty: -2.7 points
Context Engineering Signals (2026):
Structural quality: 8.0/10 (skills:17 hooks:yes subagents:6 workflows:no plugins:no)
✓ Verification Discipline — verify, adversarial, asserts
✓ Deterministic Hooks — PreToolUse, PostToolUse
✓ Eval / Regression Loops — .evals/
✓ Anti-Drift Maintenance Hygiene — sentinel, audit
○ Plugin Distribution
○ Measured Outcomes
✓ L6 signal gate satisfied
✓ L7 signal gate satisfied
○ L8 signal gate: missing plugins, measured_outcomes
Signal Bonus: +10.0
Recommendations:
→ ✅ For Level 8: add plugin distribution (.claude-plugin manifest). See https://docs.claude.com/en/docs/claude-code/plugins
→ 🪶 Trim your always-on files (CLAUDE.md): every always-loaded line must change
behavior, or cut it. See docs/AGENTS_FILE_GUIDANCE.md
→ 🔄 Add workflows/ for multi-step process definitions
Note: since v0.6.0, Levels 6–8 require the matching context-engineering signals (not just file coverage) — so a repo can have strong signals (L6/L7 gates satisfied) yet still sit at Level 5 until its higher-level file coverage catches up, as above.
pip install measure-ai-proficiencyOr install from source:
git clone https://github.com/pskoett/measuring-ai-proficiency.git
cd measuring-ai-proficiency
pip install -e .The tool supports two scanning modes - choose the one that fits your workflow:
Scan repositories on your local disk. Works offline, no authentication needed.
# Scan current directory
measure-ai-proficiency
# Scan specific repository
measure-ai-proficiency /path/to/repo
# Scan multiple repositories
measure-ai-proficiency /path/to/repo1 /path/to/repo2
# Scan all repos in a directory (cloned GitHub org)
measure-ai-proficiency --org /path/to/cloned-orgBest for:
- Local development
- Offline scanning
- Repos already on disk
- No GitHub authentication needed
Scan GitHub repositories directly without cloning them using the GitHub CLI:
# Scan a single GitHub repository
measure-ai-proficiency --github-repo owner/repo
# Scan an entire GitHub organization
measure-ai-proficiency --github-org anthropic
# Limit the number of repos scanned
measure-ai-proficiency --github-org anthropic --limit 50
# Combine with output formats
measure-ai-proficiency --github-org your-org --format json --output report.jsonBest for:
- Scanning organizations without cloning
- Large-scale analysis
- Private repos (with authentication)
- Quick assessments of remote repos
Requirements:
- GitHub CLI (gh) must be installed
- Must be authenticated:
gh auth login
How it works:
- Uses GitHub API to fetch repository file tree
- Downloads only relevant AI proficiency files (no full clone!)
- Scans the files locally in a temporary directory
- Cleans up automatically after scanning
Advantages over cloning:
- ✅ Much faster (only downloads relevant files)
- ✅ No disk space needed for full repositories
- ✅ Works with hundreds of repos efficiently
- ✅ Automatic cleanup after scanning
| Feature | Local Scanning | GitHub CLI Scanning |
|---|---|---|
| Setup | None | Requires gh CLI + auth |
| Network | Works offline | Requires internet |
| Speed | Fast for local repos | Fast for remote repos |
| Disk Space | Uses existing repos | Minimal (temp files only) |
| Authentication | Not needed | GitHub auth required |
| Private Repos | Must clone first | Works if authenticated |
| Best Use | Local development | Organization-wide analysis |
Note: Both scanning modes support all features:
- All output formats (terminal, JSON, markdown, CSV)
- All CLI flags (--format, --output, -q, --min-level)
- Cross-reference detection and quality scoring
- Tool auto-detection
- Custom configuration (
.ai-proficiency.yaml)
Both scanning modes work with all output formats:
# Terminal output (default, with colors)
measure-ai-proficiency
# JSON output
measure-ai-proficiency --format json
# Markdown report
measure-ai-proficiency --format markdown
# CSV (for spreadsheets)
measure-ai-proficiency --format csv
# Save to file
measure-ai-proficiency --format markdown --output report.md# Quiet mode (summary only)
measure-ai-proficiency -q
# Filter by minimum level
measure-ai-proficiency --org /path/to/org --min-level 2The tool measures maturity on an 8-level scale aligned with Steve Yegge's stages:
| Level | Name | Description |
|---|---|---|
| 1 | Zero AI | No AI-specific files (baseline) |
| 2 | Basic Instructions | Basic context files (CLAUDE.md, .cursorrules, etc.) |
| 3 | Comprehensive Context | Architecture, conventions, patterns documented |
| 4 | Skills & Automation | Hooks, commands, memory files, workflows |
| 5 | Multi-Agent Ready | Multiple agents, MCP configs, handoffs |
| 6 | Fleet Infrastructure | Beads, shared context, workflow pipelines |
| 7 | Agent Fleet | Governance, scheduling, 10+ agents |
| 8 | Custom Orchestration | Gas Town, meta-automation, frontier |
The tool auto-detects and supports all major AI coding tools:
- Claude Code:
CLAUDE.md,AGENTS.md,.claude/ - GitHub Copilot:
.github/copilot-instructions.md,.github/agents/,.github/skills/ - Cursor:
.cursorrules,.cursor/rules/,.cursor/skills/ - OpenAI Codex CLI:
CODEX.md,.codex/,AGENTS.md - VSCode AI:
.vscode/*.md
Smart Scanning: Automatically excludes node_modules/, venv/, dist/, build/, and other dependency folders.
============================================================
AI Proficiency Report: measuring-ai-proficiency
============================================================
Overall Level: Level 5: Multi-Agent Ready
Overall Score: 60.1/100
AI Tools: Claude Code, GitHub Copilot, OpenAI Codex
Level Breakdown:
✓ Level 1: Zero AI
[████████████████████] 100.0% (1 files)
● README.md
✓ Level 2: Basic Instructions
[███████████████░░░░░] 75.0% (3 files)
● CLAUDE.md
● AGENTS.md
● .github/copilot-instructions.md
✓ Level 3: Comprehensive Context
[███|░░░░░░░░░░░░░░░░] 18.8%/15% ✓ (62 files)
● .github/PULL_REQUEST_TEMPLATE.md
● .github/copilot-instructions.md
... and 60 more
✓ Level 4: Skills & Automation
[██|░░░░░░░░░░░░░░░░░] 13.7%/12% ✓ (20 files)
● .claude/skills/measure-ai-proficiency/SKILL.md
● .claude/skills/agentic-workflow/SKILL.md
... and 18 more
✓ Level 5: Multi-Agent Ready
[██|░░░░░░░░░░░░░░░░░] 13.9%/10% ✓ (8 files)
● .github/agents/debug-agentic-workflow.agent.md
● .github/agents/improve-ai-context.agent.md
... and 6 more
○ Level 6: Fleet Infrastructure
[░|░░░░░░░░░░░░░░░░░░] 0.0%/8% needs +8.0% (0 files)
Cross-References & Quality:
References: 126 found in 11 files
Unique targets: 34
Resolved: 82/126 (65%)
Content Quality:
CLAUDE.md: 10.0/10 (797 words) [§ ⌘ $ ! ↻12]
AGENTS.md: 10.0/10 (513 words) [§ ⌘ $ !]
.github/copilot-instructions.md: 10.0/10 (755 words) [§ ⌘ $ ↻9]
.claude/skills/measure-ai-proficiency/SKILL.md: 10.0/10 (1099 words) [§ ⌘ $ ! ↻11]
Quality indicators:
§=sections ⌘=paths $=commands !=constraints ↻N=commits
Bonus: +9.1 points
Recommendations:
→ 🔍 Detected AI tools: Claude Code, GitHub Copilot, OpenAI Codex.
→ 🎯 FLEET READY: You have multi-agent setup. Now add fleet infrastructure.
→ 🧠 Set up Beads: Create .beads/ for persistent memory across agent sessions.
============================================================
{
"repo_name": "measuring-ai-proficiency",
"overall_level": 5,
"overall_score": 60.09,
"detected_tools": ["claude-code", "github-copilot", "openai-codex"],
"level_scores": {
"1": {"name": "Level 1: Zero AI", "coverage_percent": 100.0, "file_count": 1},
"2": {"name": "Level 2: Basic Instructions", "coverage_percent": 75.0, "file_count": 3},
"3": {"name": "Level 3: Comprehensive Context", "coverage_percent": 18.75, "file_count": 62},
"4": {"name": "Level 4: Skills & Automation", "coverage_percent": 13.73, "file_count": 20},
"5": {"name": "Level 5: Multi-Agent Ready", "coverage_percent": 13.89, "file_count": 8}
},
"cross_references": {
"total_count": 126,
"resolved_count": 82,
"resolution_rate": 65.08,
"bonus_points": 9.08
},
"recommendations": ["..."]
}The markdown format produces a detailed report with tables:
| Level | Coverage | Files | Status |
|---|---|---|---|
| Level 1: Zero AI | 100.0% | 1 | ✓ |
| Level 2: Basic Instructions | 75.0% | 3 | ✓ |
| Level 3: Comprehensive Context | 18.8% | 62 | ✓ |
| Level 4: Skills & Automation | 13.7% | 20 | ✓ |
| Level 5: Multi-Agent Ready | 13.9% | 8 | ✓ |
| Level 6: Fleet Infrastructure | 0.0% | 0 | ○ |
Content Quality:
| File | Score | Words | Commits |
|---|---|---|---|
CLAUDE.md |
10.0/10 | 797 | 12 |
AGENTS.md |
10.0/10 | 513 | - |
.github/copilot-instructions.md |
10.0/10 | 755 | 9 |
.claude/skills/measure-ai-proficiency/SKILL.md |
10.0/10 | 1099 | 11 |
| Tool | Files |
|---|---|
| Claude Code | CLAUDE.md, AGENTS.md |
| GitHub Copilot | .github/copilot-instructions.md, .github/AGENTS.md |
| Cursor | .cursorrules, .cursor/*.md |
| OpenAI Codex | CODEX.md, .codex/*.md |
| Category | Files |
|---|---|
| Architecture | ARCHITECTURE.md, docs/ARCHITECTURE.md, DESIGN.md |
| API & Data | API.md, docs/API.md, DATA_MODEL.md |
| Standards | CONVENTIONS.md, STYLE.md, CONTRIBUTING.md, PATTERNS.md, PR_REVIEW.md |
| Development | DEVELOPMENT.md, TESTING.md, DEBUGGING.md, DEPLOYMENT.md |
| Category | Files |
|---|---|
| Skills | .claude/skills/*/SKILL.md, .github/skills/*/SKILL.md, .cursor/skills/*/SKILL.md |
| Workflows | .claude/commands/, WORKFLOWS.md, scripts/ |
| Memory | MEMORY.md, LEARNINGS.md, DECISIONS.md |
| Hooks | .claude/hooks/, .claude/settings.json |
| Category | Files |
|---|---|
| Agents | .github/agents/*.agent.md, agents/HANDOFFS.md |
| MCP | .mcp.json, .mcp/*.json |
| Orchestration | orchestration.yaml, workflows/*.yaml |
The tool analyzes the content of your AI instruction files, not just their existence.
| Indicator | What We Look For | Points |
|---|---|---|
| Sections | Markdown headers (##) - 5+ headers = full points |
0-2 |
| Paths | Concrete file paths (/src/, ~/config/) |
0-2 |
| Commands | CLI commands in backticks (`npm test`) |
0-2 |
| Constraints | "never", "avoid", "don't", "must not", "always" | 0-2 |
| Substance | Word count (200+ = 2pts, 50-200 = 1pt) | 0-2 |
| Commits | Git history (5+ = 2pts, 3-4 = 1pt) | 0-2 |
Quality score is capped at 10.
- Cross-reference bonus (up to 5 pts): References between docs, resolution rate
- Quality bonus (up to 5 pts): Half of average quality score
Different teams use different file names. The tool works best when customized:
- Run the tool to see what it detects
- Review patterns in
measure_ai_proficiency/config.py - Add your team's specific file names
- Adjust thresholds if needed
📖 Read the full customization guide
Create .ai-proficiency.yaml in your repository:
# Specify which AI tools your team uses
tools:
- claude-code
- github-copilot
# Adjust level thresholds (lower = easier to advance)
thresholds:
level_3: 10 # Default: 15
level_4: 8 # Default: 12
# Skip certain recommendations
skip_recommendations:
- hooks
- gastown# In config.py, add your team's files:
file_patterns=[
"SYSTEM_DESIGN.md", # Instead of ARCHITECTURE.md
"documentation/*.md", # Instead of docs/
"CODING_STANDARDS.md", # Instead of CONVENTIONS.md
]Automatically assess AI proficiency on every PR:
# Quick setup with GitHub Agentic Workflows
gh extension install githubnext/gh-aw
gh aw add pskoett/measuring-ai-proficiency/.github/workflows/ai-proficiency-pr-review --create-pull-request📖 Full GitHub Action documentation
Want AI to help improve your context engineering? Add skills to your repository:
| Skill | Description |
|---|---|
| measure-ai-proficiency | Assess repository AI maturity |
| customize-measurement | Generate a customized .ai-proficiency.yaml |
| plan-interview | Structured requirements gathering |
| agentic-workflow | Create natural language GitHub Actions |
For Claude Code:
mkdir -p .claude/skills/measure-ai-proficiency
curl -o .claude/skills/measure-ai-proficiency/SKILL.md \
https://raw.githubusercontent.com/pskoett/measuring-ai-proficiency/main/skill-template/measure-ai-proficiency/SKILL.mdFor GitHub Copilot:
mkdir -p .github/skills/measure-ai-proficiency
curl -o .github/skills/measure-ai-proficiency/SKILL.md \
https://raw.githubusercontent.com/pskoett/measuring-ai-proficiency/main/skill-template/measure-ai-proficiency/SKILL.mdThen ask your AI: "Assess my AI proficiency" or "What context files should I add?"
For systematic improvement workflows, use the AI Context Improvement Agent:
Available in this repo:
.github/agents/improve-ai-context.agent.md- For GitHub Copilot.claude/agents/improve-ai-context.agent.md- For Claude Code
Integrated workflow using three skills:
- plan-interview - Gathers requirements about your team's goals and constraints
- customize-measurement - Creates tailored
.ai-proficiency.yamlconfiguration - measure-ai-proficiency - Assesses and systematically improves your context
What it does:
- Interviews you about team goals, tools, and target maturity level
- Creates custom configuration matching your needs
- Runs AI proficiency assessment with your config
- Analyzes gaps and quality scores
- Creates missing context files systematically
- Improves existing files with low quality scores
- Re-scans to verify improvements
- Works with single repos or entire organizations
How to use:
Ask your AI assistant:
- "Improve my AI proficiency" (full workflow with interview and config)
- "Help me understand what context I need" (starts with plan-interview)
- "Configure measurement for my team" (uses customize-measurement)
- "Create missing context files" (quick assessment and improvement)
- "Fix my low quality score" (focused on quality improvements)
- "Advance to Level 4" (goal-oriented improvement)
The agent will guide you through requirements gathering, create a custom configuration, scan your repo, identify gaps, and systematically create or improve files to advance your maturity level.
You can now scan entire GitHub organizations without cloning any repositories:
# Scan all repos in an organization
measure-ai-proficiency --github-org your-org-name --format json --output report.jsonAlternatively, use the discovery script to first see which repos have AI artifacts:
./scripts/find-org-repos.sh your-org-name
# Example output:
# Organization: anthropics
# Active repositories: 45
# Repos with AI context artifacts: 12 (26.7%)Both methods require GitHub CLI (gh) to be installed and authenticated. The discovery script also requires jq.
# Assess AI proficiency across your organization
measure-ai-proficiency --org /path/to/all-repos --format csv --output proficiency.csv- name: Check AI Proficiency
run: |
pip install git+https://github.com/pskoett/measuring-ai-proficiency.git
measure-ai-proficiency --min-level 1# Show new team members what context engineering looks like
measure-ai-proficiency- File Detection: Scan for patterns at each level (1-8)
- Substantiveness Check: Files must have >100 bytes to count
- Coverage Calculation: Percentage of patterns matched per level
- Level Achievement:
- Level 2: At least one AI context file
- Level 3: Level 2 + ≥15% coverage
- Level 4: Level 3 + ≥12% coverage
- Level 5+: Progressive thresholds (≥10%, ≥8%, ≥6%, ≥5%)
- Minimum Score Guarantees: Each level achieved guarantees a minimum score:
- Level 2: 15, Level 3: 30, Level 4: 45, Level 5: 55
- Level 6: 70, Level 7: 85, Level 8: 95
- Bonus: Up to +10 points from cross-references and quality
- Validation Penalties: Up to -10 points for issues (see below)
The tool validates your context files beyond just counting them:
| Validation | What It Checks | Penalty |
|---|---|---|
| Freshness | Files updated within 90 days of code changes | -2 pts per stale file (max -6) |
| Alignment | Referenced files actually exist | -1 pt per missing ref (max -4) |
| Templates | Detects copy-pasted boilerplate markers | -2 pts if majority are templates |
Warnings displayed:
STALE: CLAUDE.md last updated 120 days ago- Your context is outdatedMISSING REF: CLAUDE.md references 'src/old.ts' (deleted)- Broken referencesTEMPLATE: CLAUDE.md contains template markers- Uncustomized boilerplate
Higher levels now have concrete behavioral requirements beyond file patterns:
| Level | Indicator | What It Checks |
|---|---|---|
| L6 | CI/CD Integration | GitHub Actions workflows that invoke agents |
| L7 | Agent Handoffs | 2+ agents with documented handoff protocols |
| L8 | Measured Outcomes | Metrics files, logs, or success tracking |
These indicators are displayed in the output but don't affect scoring (yet).
Low score but lots of files? This is normal! The tool includes hundreds of patterns. Your team likely uses different file names - customize the patterns for accurate scoring.
Got validation warnings? These identify issues that reduce AI effectiveness:
- Stale context files mislead AI with outdated information
- Broken references confuse AI about your codebase structure
- Template content provides generic rather than project-specific guidance
Contributions welcome! Areas of interest:
- Additional file patterns for new tools
- ✅
Integration with GitHub API for remote scanning(implemented via GitHub CLI) - Historical tracking and trend analysis
- IDE extensions
See FEATURE_BACKLOG.md for advanced metrics and architectural patterns we're exploring—including semantic drift detection, progressive disclosure scoring, and security governance indicators.
MIT
- Context Engineering Article - The thinking behind this tool
- Steve Yegge's Gas Town - Behavioral maturity model inspiration
- Claude Code Skills | GitHub Copilot Skills | Agent Skills Standard