Ariel-Rodriguez · Ariel-Rodriguez · Feb 7, 2026 · Feb 7, 2026 · Feb 7, 2026 · Feb 7, 2026
diff --git a/.github/workflows/benchmark-dashboard.yml b/.github/workflows/benchmark-dashboard.yml
@@ -21,88 +21,40 @@ on:
         type: string
 
 jobs:
-  run-benchmark:
+  run-benchmark-and-publish:
     runs-on: ubuntu-latest
 
     steps:
       - uses: actions/checkout@v4
         with:
           ref: main
-          path: workspace
+          path: repo
 
       - uses: astral-sh/setup-uv@v5
         with:
           enable-cache: true
 
       - name: Run benchmark
-        working-directory: workspace
+        working-directory: repo
         env:
           OLLAMA_API_KEY: ${{ secrets.OLLAMA_API_KEY }}
         run: |
-          # Install dependencies
           uv sync --project tests
-          # Run benchmark
           uv run --project tests tests/evaluator.py \
             --provider ${{ inputs.provider }} \
             --model ${{ inputs.model }} \
             --judge \
             --verbose \
-            --report
-          # Rename artifact for clarity
-          if [ -d tests/results ]; then
-            ARTIFACT_NAME="benchmark-${{ inputs.provider }}-${{ inputs.model }}-$(date +%Y%m%d-%H%M%S)"
-            mv tests/results "tests/${ARTIFACT_NAME}"
-          fi
-          # Create docs/benchmarks if it doesn't exist for publish_benchmarks.py
-          mkdir -p docs/benchmarks
+            --report \
+            --all
 
-      - name: Generate dashboard
-        working-directory: workspace
+      - name: Publish to benchmark-history branch
+        working-directory: repo
         run: |
+          # Run publish_benchmarks.py with correct paths
           uv run --project tests python3 ci/publish_benchmarks.py \
             --provider ${{ inputs.provider }} \
             --model ${{ inputs.model }} \
             --branch benchmark-history \
-            --no-benchmark
-
-      - name: Generate dashboard
-        working-directory: workspace
-        run: |
-          uv run --project tests python3 ci/publish_benchmarks.py \
-            --provider ${{ inputs.provider }} \
-            --model ${{ inputs.model }} \
-            --branch benchmark-history
-
-  deploy-pages:
-    needs: run-benchmark
-    runs-on: ubuntu-latest
-
-    steps:
-      - name: Checkout workspace
-        uses: actions/checkout@v4
-        with:
-          ref: main
-          path: workspace
-
-      - name: Checkout benchmark data
-        uses: actions/checkout@v4
-        with:
-          ref: benchmark-history
-          path: benchmark-data
-
-      - name: Copy results to docs
-        run: |
-          mkdir -p workspace/docs/benchmarks
-          cp benchmark-data/docs/benchmarks.json workspace/docs/benchmarks.json 2>/dev/null || true
-          cp benchmark-data/docs/index.html workspace/docs/index.html 2>/dev/null || true
-          # Also copy individual benchmark results if they exist
-          cp -r benchmark-data/docs/benchmarks/*.json workspace/docs/benchmarks/ 2>/dev/null || true
-
-      - name: Commit and push updates
-        working-directory: workspace
-        run: |
-          git config user.name "GitHub Actions"
-          git config user.email "actions@github.com"
-          git add docs/
-          git commit -m "Update benchmark data" || echo "No changes to commit"
-          git push origin HEAD:benchmark-history
+            --no-benchmark \
+            --output-dir "site/benchmarks"
diff --git a/.github/workflows/skill-validation.yml b/.github/workflows/skill-validation.yml
@@ -107,11 +107,6 @@ jobs:
           # Construct arguments array for safety
           ARGS=(--provider "${{ matrix.provider }}" --model "${{ matrix.model }}" --judge --verbose --report --threshold 50)
 
-          if [ -n "${{ matrix.extra_args }}" ]; then
-            # Split extra_args safely if needed, but for now assuming simple flags
-            ARGS+=(${{ matrix.extra_args }})
-          fi
-
           if [ -n "${{ matrix.skill }}" ]; then
             ARGS+=(--skill "${{ matrix.skill }}")
           else
@@ -128,7 +123,7 @@ jobs:
         uses: actions/upload-artifact@v4
         with:
           name: ${{ steps.artifact.outputs.name }}
-          path: pr-code/tests/results/
+          path: pr-code/tests/data-history/
           retention-days: 1
 
   consolidate:
@@ -145,10 +140,10 @@ jobs:
 
       - uses: actions/download-artifact@v4
         with:
-          path: pr-code/tests/results/
+          path: pr-code/tests/data-history/
 
       - name: Consolidate results
-        run: python3 trusted-scripts/ci/consolidate_results.py --results-dir pr-code/tests/results --output-file pr-code/comment.md
+        run: python3 trusted-scripts/ci/consolidate_results.py --results-dir pr-code/tests/data-history --output-file pr-code/comment.md
 
       - name: Post to PR
         uses: marocchino/sticky-pull-request-comment@v2

diff --git a/.gitignore b/.gitignore
@@ -46,4 +46,6 @@ scratch-*
 
 # CI/Validation Artifacts
 comment.md
-results/
+
+# Benchmark site output
+site/
diff --git a/README.md b/README.md
@@ -2,6 +2,10 @@
 
 Language-agnostic AI agent skills that enforce fundamental programming principles. This repository provides specific, granular instructions that enable AI coding assistants to produce significantly higher-quality code that adheres to robust engineering standards.
 
+| Dashboard Explorer | Code Comparison | Judge Reasoning |
+| :---: | :---: | :---: |
+| ![Dashboard](docs/img/dashboard.png) | ![Code Comparison](docs/img/compare-code.png) | ![Judge Results](docs/img/compare-judge-results.png) |
+
 Adopting these skills measurably changes the output of AI models, shifting them from generating merely functional code to producing architecturally sound solutions.
 
 ## Table of Contents
@@ -15,66 +19,50 @@ Adopting these skills measurably changes the output of AI models, shifting them
 
 ## Installation
 
-Select your platform for specific setup instructions:
+See:
 
-- [Cursor](docs/install/cursor.md)
-- [Antigravity](docs/install/antigravity.md)
-- [GitHub Copilot](docs/install/copilot.md)
-- [Claude](docs/install/claude.md)
+- [Install Instructions](docs/install-instructions.md)
 
 ## How it Works
 
 The core of this repository is the `skills/` directory. Each skill is encapsulated in its own subdirectory following the `ps-<name>` convention (e.g., `ps-composition-over-coordination`).
 
 We use this granular structure because:
+
 1.  **Focus**: It allows the AI to load only the relevant context for a specific task, avoiding context window pollution.
 2.  **Modularity**: Skills can be improved, versioned, and tested independently.
 3.  **Composability**: Users can select the specific combination of principles they want to enforce for their project.
 
+## Skill Integration
+
+Skills should live under the `skills/` directory as `SKILL.md` files. For a full integration guide and documentation index:
+
+```
+https://agentskills.io/integrate-skills
+https://agentskills.io/llms.txt
+```
+
 ## Validation & Testing
 
 Every skill is validated against a rigorous testing suite found in the `tests/` directory.
 
 - **Automated Judging**: We use an LLM-as-a-Judge approach. The system compares the output of a "Baseline" model (without the skill) against a "Skill" model (with the skill loaded).
-- **Semantics over Syntax**: The test does not just look for passing unit tests; it analyzes the *logic* and *structure* of the code.
+- **Semantics over Syntax**: The test does not just look for passing unit tests; it analyzes the _logic_ and _structure_ of the code.
 - **Evidence-Based**: The judge identifies the specific lines of code that demonstrate adherence to or violation of the principle.
 
 [Read our Case Study on Judge Fairness](docs/judge-fairness-case-study.md) to see how the system fairly evaluates architectural quality, even when it means failing the Skill model.
 
 ## Evaluation Results
 
-Processed 24 evaluation(s).
-
-| Test Name | Model | Baseline | With Skill | Cases Pass | Winner |
-|-----------|-------|----------|------------|------------|--------|
-| [results-ollama-devstral-small-2--24b-cloud-ps-composition-over-coordination](https://github.com/Ariel-Rodriguez/programming-skills/actions/runs/21547621647/artifacts/5329534502) | devstral-small-2:24b-cloud | good | good | ✅ 2/2 | N/A |
-| [results-ollama-devstral-small-2--24b-cloud-ps-error-handling-design](https://github.com/Ariel-Rodriguez/programming-skills/actions/runs/21547621647/artifacts/5329534340) | devstral-small-2:24b-cloud | regular | outstanding | ✅ 2/2 | With Skill |
-| [results-ollama-devstral-small-2--24b-cloud-ps-explicit-boundaries-adapters](https://github.com/Ariel-Rodriguez/programming-skills/actions/runs/21547621647/artifacts/5329535854) | devstral-small-2:24b-cloud | good | outstanding | ✅ 2/2 | With Skill |
-| [results-ollama-devstral-small-2--24b-cloud-ps-explicit-ownership-lifecycle](https://github.com/Ariel-Rodriguez/programming-skills/actions/runs/21547621647/artifacts/5329535894) | devstral-small-2:24b-cloud | good | good | ✅ 2/2 | With Skill |
-| [results-ollama-devstral-small-2--24b-cloud-ps-explicit-state-invariants](https://github.com/Ariel-Rodriguez/programming-skills/actions/runs/21547621647/artifacts/5329537422) | devstral-small-2:24b-cloud | good | outstanding | ✅ 2/2 | With Skill |
-| [results-ollama-devstral-small-2--24b-cloud-ps-functional-core-imperative-shell](https://github.com/Ariel-Rodriguez/programming-skills/actions/runs/21547621647/artifacts/5329537046) | devstral-small-2:24b-cloud | regular | good | ✅ 2/2 | With Skill |
-| [results-ollama-devstral-small-2--24b-cloud-ps-illegal-states-unrepresentable](https://github.com/Ariel-Rodriguez/programming-skills/actions/runs/21547621647/artifacts/5329538523) | devstral-small-2:24b-cloud | good | outstanding | ✅ 2/2 | With Skill |
-| [results-ollama-devstral-small-2--24b-cloud-ps-local-reasoning](https://github.com/Ariel-Rodriguez/programming-skills/actions/runs/21547621647/artifacts/5329538780) | devstral-small-2:24b-cloud | good | outstanding | ✅ 2/2 | With Skill |
-| [results-ollama-devstral-small-2--24b-cloud-ps-minimize-mutation](https://github.com/Ariel-Rodriguez/programming-skills/actions/runs/21547621647/artifacts/5329540068) | devstral-small-2:24b-cloud | good | good | ✅ 2/2 | N/A |
-| [results-ollama-devstral-small-2--24b-cloud-ps-naming-as-design](https://github.com/Ariel-Rodriguez/programming-skills/actions/runs/21547621647/artifacts/5329540040) | devstral-small-2:24b-cloud | regular | good | ✅ 2/2 | With Skill |
-| [results-ollama-devstral-small-2--24b-cloud-ps-policy-mechanism-separation](https://github.com/Ariel-Rodriguez/programming-skills/actions/runs/21547621647/artifacts/5329541792) | devstral-small-2:24b-cloud | good | outstanding | ✅ 2/2 | With Skill |
-| [results-ollama-devstral-small-2--24b-cloud-ps-single-direction-data-flow](https://github.com/Ariel-Rodriguez/programming-skills/actions/runs/21547621647/artifacts/5329541535) | devstral-small-2:24b-cloud | regular | good | ✅ 2/2 | With Skill |
-| [results-ollama-rnj-1--8b-cloud-ps-composition-over-coordination](https://github.com/Ariel-Rodriguez/programming-skills/actions/runs/21547621647/artifacts/5329524580) | rnj-1:8b-cloud | outstanding | good | ❌ 2/2 | Baseline |
-| [results-ollama-rnj-1--8b-cloud-ps-error-handling-design](https://github.com/Ariel-Rodriguez/programming-skills/actions/runs/21547621647/artifacts/5329524126) | rnj-1:8b-cloud | vague | outstanding | ✅ 2/2 | With Skill |
-| [results-ollama-rnj-1--8b-cloud-ps-explicit-boundaries-adapters](https://github.com/Ariel-Rodriguez/programming-skills/actions/runs/21547621647/artifacts/5329526125) | rnj-1:8b-cloud | regular | outstanding | ✅ 2/2 | With Skill |
-| [results-ollama-rnj-1--8b-cloud-ps-explicit-ownership-lifecycle](https://github.com/Ariel-Rodriguez/programming-skills/actions/runs/21547621647/artifacts/5329526263) | rnj-1:8b-cloud | good | outstanding | ✅ 2/2 | With Skill |
-| [results-ollama-rnj-1--8b-cloud-ps-explicit-state-invariants](https://github.com/Ariel-Rodriguez/programming-skills/actions/runs/21547621647/artifacts/5329528479) | rnj-1:8b-cloud | regular | outstanding | ✅ 2/2 | With Skill |
-| [results-ollama-rnj-1--8b-cloud-ps-functional-core-imperative-shell](https://github.com/Ariel-Rodriguez/programming-skills/actions/runs/21547621647/artifacts/5329527817) | rnj-1:8b-cloud | regular | outstanding | ✅ 2/2 | With Skill |
-| [results-ollama-rnj-1--8b-cloud-ps-illegal-states-unrepresentable](https://github.com/Ariel-Rodriguez/programming-skills/actions/runs/21547621647/artifacts/5329529527) | rnj-1:8b-cloud | outstanding | outstanding | ✅ 2/2 | N/A |
-| [results-ollama-rnj-1--8b-cloud-ps-local-reasoning](https://github.com/Ariel-Rodriguez/programming-skills/actions/runs/21547621647/artifacts/5329529241) | rnj-1:8b-cloud | vague | outstanding | ✅ 2/2 | With Skill |
-| [results-ollama-rnj-1--8b-cloud-ps-minimize-mutation](https://github.com/Ariel-Rodriguez/programming-skills/actions/runs/21547621647/artifacts/5329531124) | rnj-1:8b-cloud | regular | outstanding | ✅ 2/2 | With Skill |
-| [results-ollama-rnj-1--8b-cloud-ps-naming-as-design](https://github.com/Ariel-Rodriguez/programming-skills/actions/runs/21547621647/artifacts/5329531393) | rnj-1:8b-cloud | vague | good | ✅ 2/2 | With Skill |
-| [results-ollama-rnj-1--8b-cloud-ps-policy-mechanism-separation](https://github.com/Ariel-Rodriguez/programming-skills/actions/runs/21547621647/artifacts/5329532599) | rnj-1:8b-cloud | regular | outstanding | ✅ 2/2 | With Skill |
-| [results-ollama-rnj-1--8b-cloud-ps-single-direction-data-flow](https://github.com/Ariel-Rodriguez/programming-skills/actions/runs/21547621647/artifacts/5329532551) | rnj-1:8b-cloud | vague | good | ✅ 2/2 | With Skill |
+Dashboard:
+
+```
+https://ariel-rodriguez.github.io/programming-skills/
+```
 
 ## Documentation
 
-- [Architecture](docs/architecture.md) - Repository design & structure
+- [Architecture](docs/specs/architecture.md) - Repository design & structure
 - [Contributing](docs/contributing.md) - How to add/modify skills & benchmarks
 - [AI Prompt Wrapper](docs/ai-prompt-wrapper.md) - Configure your AI assistant
 - [Changelog](CHANGELOG.md) - Version history & skill changes

diff --git a/ci/consolidate_results.py b/ci/consolidate_results.py
@@ -113,7 +113,7 @@ def main():
     parser = argparse.ArgumentParser(description="Consolidate evaluation results")
     parser.add_argument("--mode", choices=["pr-comment", "benchmark"], default="pr-comment",
                        help="Output mode: pr-comment or benchmark")
-    parser.add_argument("--results-dir", type=Path, default="tests/results",
+    parser.add_argument("--results-dir", type=Path, default="tests/data-history",
                        help="Directory containing evaluation results")
     parser.add_argument("--output-dir", type=Path, default=None,
                        help="Output directory for benchmark mode")
@@ -126,8 +126,8 @@ def main():
 
     print(f"==> Consolidating results (mode: {args.mode})")
 
-    # Find all summary.json files
-    summary_files = sorted(args.results_dir.glob("*/summary.json"))
+    # Find all summary files
+    summary_files = sorted(args.results_dir.glob("**/summary-*.json"))
 
     if not summary_files:
         print(f"No results found in {args.results_dir}")