Skip to content

pskoett/measuring-ai-proficiency

Repository files navigation

Measure AI Proficiency

A CLI tool for measuring AI coding proficiency based on context engineering artifacts.

The Problem

Adoption ≠ proficiency. Some developers save 10+ hours a week with AI. Others get slower with the same tools.

How do you measure actual AI proficiency?

The Solution

Measure context engineering. Look at whether teams are creating files like CLAUDE.md, .cursorrules, .github/copilot-instructions.md, and AGENTS.md. These artifacts indicate that someone has moved beyond treating AI as fancy autocomplete and started deliberately shaping how AI understands their work.

This tool scans repositories for context engineering artifacts and calculates a maturity score based on an 8-level model aligned with Steve Yegge's stages.

Since v0.6.0 it also detects 2026 context-engineering signals — verification, hooks, eval loops, telemetry, anti-drift maintenance hygiene, dynamic-workflow orchestration, plugins, and harness engineering — and uses them to gate Levels 6-8 (file coverage alone is no longer enough). It also flags bloated always-on files (a permanent context tax). See docs/SIGNALS.md and docs/AGENTS_FILE_GUIDANCE.md (how to write concise, behavioral CLAUDE.md/AGENTS.md). All detection is grounded in official documentation.

Since v0.7.0 it can also prove what your context actually does, not just what exists: measure-ai-proficiency --prove runs a report-only efficacy pass (do documented commands resolve? are hooks wired? what's the always-on token budget?) and reports a separate Efficacy Score with reproducible green/red evidence — resolve-only by default; --prove-exec runs commands/hooks in a hardened sandbox (local repos only). See docs/EFFICACY.md.

Quick Start

# Install from PyPI
pip install measure-ai-proficiency

# Run on any repository
cd /path/to/your-project
measure-ai-proficiency

That's it! The tool scans for files like CLAUDE.md, .cursorrules, .github/copilot-instructions.md, and calculates a maturity score.

MCP Server - AI Context Awareness 🚀

NEW: Make your AI assistant aware of its own proficiency! The MCP server brings real-time AI context analysis directly into Claude Code, Cursor, and other MCP-compatible tools.

Quick Setup

# 1. Install (if not already installed)
pip install measure-ai-proficiency

# 2. Add to your .mcp.json
{
  "mcpServers": {
    "measure-ai-proficiency": {
      "command": "measure-ai-proficiency-mcp"
    }
  }
}

# 3. Restart Claude Code

What You Can Do

Ask your AI assistant:

  • "What's my AI proficiency level?" → Real-time analysis
  • "Are my cross-references valid?" → Validate CLAUDE.md links
  • "How can I reach Level 6?" → Get specific recommendations
  • "Scan all repos in my org" → Organization-wide assessment

Available MCP Tools

  • scan_current_repo - Analyze current repository
  • get_recommendations - Get improvement suggestions
  • check_cross_references - Validate AI context file links
  • get_level_requirements - Show next level requirements
  • scan_github_repo - Analyze remote GitHub repo
  • scan_github_org - Scan entire GitHub organization
  • validate_file_quality - Check specific file quality
  • check_harness_orchestration_quality - 2026 harness/orchestration maturity + L6-L8 gates
  • scan_for_maintenance_hygiene - Anti-drift maintenance hygiene
  • get_dynamic_workflow_recommendations - Adopt Dynamic Workflows + verification
  • curricula_alignment - Official learning on-ramp references
  • cheapest_primitive_decision_tree_report - Primitive decision discipline
  • prove_efficacy - Prove the repo's artifacts work (Efficacy Score; resolve-only or sandboxed)

📖 Full MCP Documentation - Setup, examples, troubleshooting

Why use MCP? Creates a meta-improvement loop where the tool that measures AI proficiency becomes AI-accessible, enabling real-time feedback and guided improvements.

What You'll See

============================================================
 AI Proficiency Report: my-project
============================================================

  Overall Level: Level 5: Multi-Agent Ready
  Overall Score: 59.8/100
  AI Tools: Claude Code, GitHub Copilot

  Level Breakdown:
    ✓ Level 1: Zero AI            [████████████████████] 100.0%
    ✓ Level 2: Basic Instructions [█████████████░░░░░░░] 66.7%
    ✓ Level 3: Comprehensive      [███|░░░░░░░░░░░░░░░░] 19.0%/15% ✓
    ✓ Level 4: Skills & Automation[██|░░░░░░░░░░░░░░░░░] 14.5%/12% ✓
    ✓ Level 5: Multi-Agent Ready  [██|░░░░░░░░░░░░░░░░░] 13.9%/10% ✓
    ○ Level 6: Fleet Infrastructure [░░░░░░░░░░░░░░░░░░] 0.0%/8% needs +8%

  Validation Warnings:
    📋 TEMPLATE: CLAUDE.md contains template markers (your-org-name)
    🪶 BLOAT: CLAUDE.md is 2100 words (> 1500) — large always-on files are a
       permanent context tax; move detail into scoped files/skills

    Penalty: -2.7 points

  Context Engineering Signals (2026):

    Structural quality: 8.0/10  (skills:17 hooks:yes subagents:6 workflows:no plugins:no)

    ✓ Verification Discipline — verify, adversarial, asserts
    ✓ Deterministic Hooks — PreToolUse, PostToolUse
    ✓ Eval / Regression Loops — .evals/
    ✓ Anti-Drift Maintenance Hygiene — sentinel, audit
    ○ Plugin Distribution
    ○ Measured Outcomes

    ✓ L6 signal gate satisfied
    ✓ L7 signal gate satisfied
    ○ L8 signal gate: missing plugins, measured_outcomes

    Signal Bonus: +10.0

  Recommendations:
    → ✅ For Level 8: add plugin distribution (.claude-plugin manifest). See https://docs.claude.com/en/docs/claude-code/plugins
    → 🪶 Trim your always-on files (CLAUDE.md): every always-loaded line must change
         behavior, or cut it. See docs/AGENTS_FILE_GUIDANCE.md
    → 🔄 Add workflows/ for multi-step process definitions

Note: since v0.6.0, Levels 6–8 require the matching context-engineering signals (not just file coverage) — so a repo can have strong signals (L6/L7 gates satisfied) yet still sit at Level 5 until its higher-level file coverage catches up, as above.

Installation

pip install measure-ai-proficiency

Or install from source:

git clone https://github.com/pskoett/measuring-ai-proficiency.git
cd measuring-ai-proficiency
pip install -e .

Usage

The tool supports two scanning modes - choose the one that fits your workflow:

Mode 1: Local Scanning (Default)

Scan repositories on your local disk. Works offline, no authentication needed.

# Scan current directory
measure-ai-proficiency

# Scan specific repository
measure-ai-proficiency /path/to/repo

# Scan multiple repositories
measure-ai-proficiency /path/to/repo1 /path/to/repo2

# Scan all repos in a directory (cloned GitHub org)
measure-ai-proficiency --org /path/to/cloned-org

Best for:

  • Local development
  • Offline scanning
  • Repos already on disk
  • No GitHub authentication needed

Mode 2: GitHub CLI Scanning (Optional)

Scan GitHub repositories directly without cloning them using the GitHub CLI:

# Scan a single GitHub repository
measure-ai-proficiency --github-repo owner/repo

# Scan an entire GitHub organization
measure-ai-proficiency --github-org anthropic

# Limit the number of repos scanned
measure-ai-proficiency --github-org anthropic --limit 50

# Combine with output formats
measure-ai-proficiency --github-org your-org --format json --output report.json

Best for:

  • Scanning organizations without cloning
  • Large-scale analysis
  • Private repos (with authentication)
  • Quick assessments of remote repos

Requirements:

How it works:

  1. Uses GitHub API to fetch repository file tree
  2. Downloads only relevant AI proficiency files (no full clone!)
  3. Scans the files locally in a temporary directory
  4. Cleans up automatically after scanning

Advantages over cloning:

  • ✅ Much faster (only downloads relevant files)
  • ✅ No disk space needed for full repositories
  • ✅ Works with hundreds of repos efficiently
  • ✅ Automatic cleanup after scanning

Choosing Between Modes

Feature Local Scanning GitHub CLI Scanning
Setup None Requires gh CLI + auth
Network Works offline Requires internet
Speed Fast for local repos Fast for remote repos
Disk Space Uses existing repos Minimal (temp files only)
Authentication Not needed GitHub auth required
Private Repos Must clone first Works if authenticated
Best Use Local development Organization-wide analysis

Note: Both scanning modes support all features:

  • All output formats (terminal, JSON, markdown, CSV)
  • All CLI flags (--format, --output, -q, --min-level)
  • Cross-reference detection and quality scoring
  • Tool auto-detection
  • Custom configuration (.ai-proficiency.yaml)

Output Formats

Both scanning modes work with all output formats:

# Terminal output (default, with colors)
measure-ai-proficiency

# JSON output
measure-ai-proficiency --format json

# Markdown report
measure-ai-proficiency --format markdown

# CSV (for spreadsheets)
measure-ai-proficiency --format csv

# Save to file
measure-ai-proficiency --format markdown --output report.md

Other Options

# Quiet mode (summary only)
measure-ai-proficiency -q

# Filter by minimum level
measure-ai-proficiency --org /path/to/org --min-level 2

Maturity Levels

The tool measures maturity on an 8-level scale aligned with Steve Yegge's stages:

Level Name Description
1 Zero AI No AI-specific files (baseline)
2 Basic Instructions Basic context files (CLAUDE.md, .cursorrules, etc.)
3 Comprehensive Context Architecture, conventions, patterns documented
4 Skills & Automation Hooks, commands, memory files, workflows
5 Multi-Agent Ready Multiple agents, MCP configs, handoffs
6 Fleet Infrastructure Beads, shared context, workflow pipelines
7 Agent Fleet Governance, scheduling, 10+ agents
8 Custom Orchestration Gas Town, meta-automation, frontier

Supported Tools

The tool auto-detects and supports all major AI coding tools:

  • Claude Code: CLAUDE.md, AGENTS.md, .claude/
  • GitHub Copilot: .github/copilot-instructions.md, .github/agents/, .github/skills/
  • Cursor: .cursorrules, .cursor/rules/, .cursor/skills/
  • OpenAI Codex CLI: CODEX.md, .codex/, AGENTS.md
  • VSCode AI: .vscode/*.md

Smart Scanning: Automatically excludes node_modules/, venv/, dist/, build/, and other dependency folders.

Example Output

Terminal

============================================================
 AI Proficiency Report: measuring-ai-proficiency
============================================================

  Overall Level: Level 5: Multi-Agent Ready
  Overall Score: 60.1/100
  AI Tools: Claude Code, GitHub Copilot, OpenAI Codex

  Level Breakdown:

    ✓ Level 1: Zero AI
      [████████████████████] 100.0%  (1 files)
        ● README.md

    ✓ Level 2: Basic Instructions
      [███████████████░░░░░] 75.0%  (3 files)
        ● CLAUDE.md
        ● AGENTS.md
        ● .github/copilot-instructions.md

    ✓ Level 3: Comprehensive Context
      [███|░░░░░░░░░░░░░░░░] 18.8%/15% ✓ (62 files)
        ● .github/PULL_REQUEST_TEMPLATE.md
        ● .github/copilot-instructions.md
        ... and 60 more

    ✓ Level 4: Skills & Automation
      [██|░░░░░░░░░░░░░░░░░] 13.7%/12% ✓ (20 files)
        ● .claude/skills/measure-ai-proficiency/SKILL.md
        ● .claude/skills/agentic-workflow/SKILL.md
        ... and 18 more

    ✓ Level 5: Multi-Agent Ready
      [██|░░░░░░░░░░░░░░░░░] 13.9%/10% ✓ (8 files)
        ● .github/agents/debug-agentic-workflow.agent.md
        ● .github/agents/improve-ai-context.agent.md
        ... and 6 more

    ○ Level 6: Fleet Infrastructure
      [░|░░░░░░░░░░░░░░░░░░] 0.0%/8% needs +8.0% (0 files)

  Cross-References & Quality:

    References: 126 found in 11 files
    Unique targets: 34
    Resolved: 82/126 (65%)

    Content Quality:
      CLAUDE.md: 10.0/10 (797 words) [§ ⌘ $ ! ↻12]
      AGENTS.md: 10.0/10 (513 words) [§ ⌘ $ !]
      .github/copilot-instructions.md: 10.0/10 (755 words) [§ ⌘ $ ↻9]
      .claude/skills/measure-ai-proficiency/SKILL.md: 10.0/10 (1099 words) [§ ⌘ $ ! ↻11]

    Quality indicators:
      §=sections  ⌘=paths  $=commands  !=constraints  ↻N=commits

    Bonus: +9.1 points

  Recommendations:

    → 🔍 Detected AI tools: Claude Code, GitHub Copilot, OpenAI Codex.
    → 🎯 FLEET READY: You have multi-agent setup. Now add fleet infrastructure.
    → 🧠 Set up Beads: Create .beads/ for persistent memory across agent sessions.

============================================================

JSON

{
  "repo_name": "measuring-ai-proficiency",
  "overall_level": 5,
  "overall_score": 60.09,
  "detected_tools": ["claude-code", "github-copilot", "openai-codex"],
  "level_scores": {
    "1": {"name": "Level 1: Zero AI", "coverage_percent": 100.0, "file_count": 1},
    "2": {"name": "Level 2: Basic Instructions", "coverage_percent": 75.0, "file_count": 3},
    "3": {"name": "Level 3: Comprehensive Context", "coverage_percent": 18.75, "file_count": 62},
    "4": {"name": "Level 4: Skills & Automation", "coverage_percent": 13.73, "file_count": 20},
    "5": {"name": "Level 5: Multi-Agent Ready", "coverage_percent": 13.89, "file_count": 8}
  },
  "cross_references": {
    "total_count": 126,
    "resolved_count": 82,
    "resolution_rate": 65.08,
    "bonus_points": 9.08
  },
  "recommendations": ["..."]
}

Markdown

The markdown format produces a detailed report with tables:

Level Coverage Files Status
Level 1: Zero AI 100.0% 1
Level 2: Basic Instructions 75.0% 3
Level 3: Comprehensive Context 18.8% 62
Level 4: Skills & Automation 13.7% 20
Level 5: Multi-Agent Ready 13.9% 8
Level 6: Fleet Infrastructure 0.0% 0

Content Quality:

File Score Words Commits
CLAUDE.md 10.0/10 797 12
AGENTS.md 10.0/10 513 -
.github/copilot-instructions.md 10.0/10 755 9
.claude/skills/measure-ai-proficiency/SKILL.md 10.0/10 1099 11

Files Detected by Level

Level 2: Basic Instructions

Tool Files
Claude Code CLAUDE.md, AGENTS.md
GitHub Copilot .github/copilot-instructions.md, .github/AGENTS.md
Cursor .cursorrules, .cursor/*.md
OpenAI Codex CODEX.md, .codex/*.md

Level 3: Comprehensive Context

Category Files
Architecture ARCHITECTURE.md, docs/ARCHITECTURE.md, DESIGN.md
API & Data API.md, docs/API.md, DATA_MODEL.md
Standards CONVENTIONS.md, STYLE.md, CONTRIBUTING.md, PATTERNS.md, PR_REVIEW.md
Development DEVELOPMENT.md, TESTING.md, DEBUGGING.md, DEPLOYMENT.md

Level 4: Skills & Automation

Category Files
Skills .claude/skills/*/SKILL.md, .github/skills/*/SKILL.md, .cursor/skills/*/SKILL.md
Workflows .claude/commands/, WORKFLOWS.md, scripts/
Memory MEMORY.md, LEARNINGS.md, DECISIONS.md
Hooks .claude/hooks/, .claude/settings.json

Level 5+: Multi-Agent & Fleet

Category Files
Agents .github/agents/*.agent.md, agents/HANDOFFS.md
MCP .mcp.json, .mcp/*.json
Orchestration orchestration.yaml, workflows/*.yaml

Cross-Reference Detection & Quality

The tool analyzes the content of your AI instruction files, not just their existence.

Quality Scoring (0-10)

Indicator What We Look For Points
Sections Markdown headers (##) - 5+ headers = full points 0-2
Paths Concrete file paths (/src/, ~/config/) 0-2
Commands CLI commands in backticks (`npm test`) 0-2
Constraints "never", "avoid", "don't", "must not", "always" 0-2
Substance Word count (200+ = 2pts, 50-200 = 1pt) 0-2
Commits Git history (5+ = 2pts, 3-4 = 1pt) 0-2

Quality score is capped at 10.

Bonus Points (up to +10)

  • Cross-reference bonus (up to 5 pts): References between docs, resolution rate
  • Quality bonus (up to 5 pts): Half of average quality score

Customization

For Your Team

Different teams use different file names. The tool works best when customized:

  1. Run the tool to see what it detects
  2. Review patterns in measure_ai_proficiency/config.py
  3. Add your team's specific file names
  4. Adjust thresholds if needed

📖 Read the full customization guide

Configuration File

Create .ai-proficiency.yaml in your repository:

# Specify which AI tools your team uses
tools:
  - claude-code
  - github-copilot

# Adjust level thresholds (lower = easier to advance)
thresholds:
  level_3: 10   # Default: 15
  level_4: 8    # Default: 12

# Skip certain recommendations
skip_recommendations:
  - hooks
  - gastown

Add Custom Patterns

# In config.py, add your team's files:
file_patterns=[
    "SYSTEM_DESIGN.md",      # Instead of ARCHITECTURE.md
    "documentation/*.md",    # Instead of docs/
    "CODING_STANDARDS.md",   # Instead of CONVENTIONS.md
]

GitHub Action

Automatically assess AI proficiency on every PR:

# Quick setup with GitHub Agentic Workflows
gh extension install githubnext/gh-aw
gh aw add pskoett/measuring-ai-proficiency/.github/workflows/ai-proficiency-pr-review --create-pull-request

📖 Full GitHub Action documentation


Agent Skills

Want AI to help improve your context engineering? Add skills to your repository:

Available Skills

Skill Description
measure-ai-proficiency Assess repository AI maturity
customize-measurement Generate a customized .ai-proficiency.yaml
plan-interview Structured requirements gathering
agentic-workflow Create natural language GitHub Actions

Install Skills

For Claude Code:

mkdir -p .claude/skills/measure-ai-proficiency
curl -o .claude/skills/measure-ai-proficiency/SKILL.md \
  https://raw.githubusercontent.com/pskoett/measuring-ai-proficiency/main/skill-template/measure-ai-proficiency/SKILL.md

For GitHub Copilot:

mkdir -p .github/skills/measure-ai-proficiency
curl -o .github/skills/measure-ai-proficiency/SKILL.md \
  https://raw.githubusercontent.com/pskoett/measuring-ai-proficiency/main/skill-template/measure-ai-proficiency/SKILL.md

Then ask your AI: "Assess my AI proficiency" or "What context files should I add?"

AI Agents

For systematic improvement workflows, use the AI Context Improvement Agent:

Available in this repo:

  • .github/agents/improve-ai-context.agent.md - For GitHub Copilot
  • .claude/agents/improve-ai-context.agent.md - For Claude Code

Integrated workflow using three skills:

  1. plan-interview - Gathers requirements about your team's goals and constraints
  2. customize-measurement - Creates tailored .ai-proficiency.yaml configuration
  3. measure-ai-proficiency - Assesses and systematically improves your context

What it does:

  • Interviews you about team goals, tools, and target maturity level
  • Creates custom configuration matching your needs
  • Runs AI proficiency assessment with your config
  • Analyzes gaps and quality scores
  • Creates missing context files systematically
  • Improves existing files with low quality scores
  • Re-scans to verify improvements
  • Works with single repos or entire organizations

How to use:

Ask your AI assistant:

  • "Improve my AI proficiency" (full workflow with interview and config)
  • "Help me understand what context I need" (starts with plan-interview)
  • "Configure measurement for my team" (uses customize-measurement)
  • "Create missing context files" (quick assessment and improvement)
  • "Fix my low quality score" (focused on quality improvements)
  • "Advance to Level 4" (goal-oriented improvement)

The agent will guide you through requirements gathering, create a custom configuration, scan your repo, identify gaps, and systematically create or improve files to advance your maturity level.


Discover & Scan GitHub Organizations

You can now scan entire GitHub organizations without cloning any repositories:

# Scan all repos in an organization
measure-ai-proficiency --github-org your-org-name --format json --output report.json

Alternatively, use the discovery script to first see which repos have AI artifacts:

./scripts/find-org-repos.sh your-org-name

# Example output:
# Organization: anthropics
# Active repositories: 45
# Repos with AI context artifacts: 12 (26.7%)

Both methods require GitHub CLI (gh) to be installed and authenticated. The discovery script also requires jq.


Use Cases

Engineering Leadership

# Assess AI proficiency across your organization
measure-ai-proficiency --org /path/to/all-repos --format csv --output proficiency.csv

CI/CD Integration

- name: Check AI Proficiency
  run: |
    pip install git+https://github.com/pskoett/measuring-ai-proficiency.git
    measure-ai-proficiency --min-level 1

Team Onboarding

# Show new team members what context engineering looks like
measure-ai-proficiency

Scoring Algorithm

  1. File Detection: Scan for patterns at each level (1-8)
  2. Substantiveness Check: Files must have >100 bytes to count
  3. Coverage Calculation: Percentage of patterns matched per level
  4. Level Achievement:
    • Level 2: At least one AI context file
    • Level 3: Level 2 + ≥15% coverage
    • Level 4: Level 3 + ≥12% coverage
    • Level 5+: Progressive thresholds (≥10%, ≥8%, ≥6%, ≥5%)
  5. Minimum Score Guarantees: Each level achieved guarantees a minimum score:
    • Level 2: 15, Level 3: 30, Level 4: 45, Level 5: 55
    • Level 6: 70, Level 7: 85, Level 8: 95
  6. Bonus: Up to +10 points from cross-references and quality
  7. Validation Penalties: Up to -10 points for issues (see below)

Content Validation

The tool validates your context files beyond just counting them:

Validation What It Checks Penalty
Freshness Files updated within 90 days of code changes -2 pts per stale file (max -6)
Alignment Referenced files actually exist -1 pt per missing ref (max -4)
Templates Detects copy-pasted boilerplate markers -2 pts if majority are templates

Warnings displayed:

  • STALE: CLAUDE.md last updated 120 days ago - Your context is outdated
  • MISSING REF: CLAUDE.md references 'src/old.ts' (deleted) - Broken references
  • TEMPLATE: CLAUDE.md contains template markers - Uncustomized boilerplate

Behavioral Indicators (Levels 6-8)

Higher levels now have concrete behavioral requirements beyond file patterns:

Level Indicator What It Checks
L6 CI/CD Integration GitHub Actions workflows that invoke agents
L7 Agent Handoffs 2+ agents with documented handoff protocols
L8 Measured Outcomes Metrics files, logs, or success tracking

These indicators are displayed in the output but don't affect scoring (yet).

Understanding Your Score

Low score but lots of files? This is normal! The tool includes hundreds of patterns. Your team likely uses different file names - customize the patterns for accurate scoring.

Got validation warnings? These identify issues that reduce AI effectiveness:

  • Stale context files mislead AI with outdated information
  • Broken references confuse AI about your codebase structure
  • Template content provides generic rather than project-specific guidance

Contributing

Contributions welcome! Areas of interest:

  • Additional file patterns for new tools
  • Integration with GitHub API for remote scanning (implemented via GitHub CLI)
  • Historical tracking and trend analysis
  • IDE extensions

See FEATURE_BACKLOG.md for advanced metrics and architectural patterns we're exploring—including semantic drift detection, progressive disclosure scoring, and security governance indicators.

License

MIT

Related

About

Measuring AI Proficiency: Context Engineering as a Leading Indicator

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors