From b4a53056bc8e395ce6dfe57ebd42dcd9ffdf4e70 Mon Sep 17 00:00:00 2001
From: Claude Code <noreply@anthropic.com>
Date: Wed, 11 Mar 2026 18:04:34 +0900
Subject: [PATCH] Add ai-md: CLAUDE.md optimizer by sstklen
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Convert human-written CLAUDE.md to AI-native structured format.
Battle-tested: 4 models, 5 rounds, compliance 6/8 → 8/8.

Source: https://github.com/sstklen/ai-md
---
 skills/sstklen/ai-md/SKILL.md | 524 ++++++++++++++++++++++++++++++++++
 1 file changed, 524 insertions(+)
 create mode 100644 skills/sstklen/ai-md/SKILL.md

diff --git a/skills/sstklen/ai-md/SKILL.md b/skills/sstklen/ai-md/SKILL.md
new file mode 100644
index 00000000000..3b567bcd53e
--- /dev/null
+++ b/skills/sstklen/ai-md/SKILL.md
@@ -0,0 +1,524 @@
+---
+name: ai-md
+version: 4.0.0
+description: >
+  AI.MD — Convert any human-written CLAUDE.md into AI-native structured format.
+  Your CLAUDE.md is read by AI every single turn, not by you — so write it in AI's language.
+  Battle-tested: 5 rounds, 4 models (GPT-5.3, Gemini 2.5 Pro, Grok-4, Claude Opus 4.6).
+  Structured-label format raised Codex compliance from 6/8 → 8/8 on identical content.
+  Same rules, fewer tokens, higher precision.
+  This skill contains the complete methodology: how to take natural language instructions
+  and convert them into a format that LLMs actually follow better.
+  Run with: "distill my CLAUDE.md" or "rewrite my MD for AI" or "蒸餾" or "AI.MD"
+triggers:
+  - user says "CLAUDE.md 太長" "太肥" "額度掉很快" "token 太多"
+  - user says "蒸餾" "distill" "壓縮 MD" "rewrite MD for AI" "ai-md" "AI.MD"
+  - user says "optimize my instructions" "make my MD better"
+  - user notices quota draining fast
+  - CLAUDE.md + rules total > 10KB
+tags: [optimization, token-saving, system-prompt, ai-native, meta, battle-tested, methodology]
+---
+
+# AI.MD v4 — The Complete AI-Native Conversion System
+
+## What Is AI.MD?
+
+AI.MD is a methodology for converting human-written `CLAUDE.md` (or any LLM system instructions)
+into a structured-label format that AI models follow more reliably, using fewer tokens.
+
+**The paradox we proved:** Adding more rules in natural language DECREASES compliance.
+Converting the same rules to structured format RESTORES and EXCEEDS it.
+
+```
+Human prose (6 rules, 1 line)  → AI follows 4 of them
+Structured labels (6 rules, 6 lines) → AI follows all 6
+Same content. Different format. Different results.
+```
+
+---
+
+## Why It Works: How LLMs Actually Process Instructions
+
+LLMs don't "read" — they **attend**. Understanding this changes everything.
+
+### Mechanism 1: Attention Splitting
+
+When multiple rules share one line, the model's attention distributes across all tokens equally.
+Each rule gets a fraction of the attention weight. Some rules get lost.
+
+When each rule has its own line, the model processes it as a distinct unit.
+Full attention weight on each rule.
+
+```
+# ONE LINE = attention splits 5 ways (some rules drop to near-zero weight)
+EVIDENCE: no-fabricate no-guess | 禁用詞:應該是/可能是 → 先拿數據 | Read/Grep→行號 curl→數據 | "好像"/"覺得"→自己先跑test | guess=shame-wall
+
+# FIVE LINES = each rule gets full attention
+EVIDENCE:
+  core: no-fabricate | no-guess | unsure=say-so
+  banned: 應該是/可能是/感覺是/推測 → 先拿數據
+  proof: all-claims-need(data/line#/source) | Read/Grep→行號 | curl→數據
+  hear-doubt: "好像"/"覺得" → self-test(curl/benchmark) → 禁反問user
+  violation: guess → shame-wall
+```
+
+### Mechanism 2: Zero-Inference Labels
+
+Natural language forces the model to INFER meaning from context.
+Labels DECLARE meaning explicitly. No inference needed = no misinterpretation.
+
+```
+# AI must infer: what does (防搞混) modify? what does 例外 apply to?
+GATE-1: 收到任務→先用一句話複述(防搞混)(長對話中每個新任務都重新觸發) | 例外: signals命中「處理一下」=直接執行
+
+# AI reads labels directly: trigger→action→exception. Zero ambiguity.
+GATE-1 複述:
+  trigger: new-task
+  action: first-sentence="你要我做的是___"
+  persist: 長對話中每個新任務都重新觸發
+  exception: signal=處理一下 → skip
+  yields-to: GATE-3
+```
+
+Key insight: Labels like `trigger:` `action:` `exception:` work across ALL languages.
+The model doesn't need to parse Chinese/Japanese/English grammar to understand structure.
+**Labels are the universal language between humans and AI.**
+
+### Mechanism 3: Semantic Anchoring
+
+Labeled sub-items create **matchable tags**. When a user's input contains a keyword,
+the model matches it directly to the corresponding label — like a hash table lookup
+instead of a full-text search.
+
+```
+# BURIED: AI scans the whole sentence, might miss the connection
+加新功能→第一句問schema | 新增API/endpoint=必確認health-check.py覆蓋
+
+# ANCHORED: label "new-api:" directly matches user saying "加個 API"
+MOAT:
+  new-feature: 第一句問schema/契約/關聯
+  new-api: 必確認health-check.py覆蓋(GATE-5)
+```
+
+**Real proof:** This specific technique fixed a test case that failed 5 consecutive times
+across all models. The label `new-api:` raised Codex T5 from ❌→✅ on first try.
+
+---
+
+## The Conversion Process: What Happens When You Give Me a CLAUDE.md
+
+Here's the exact mental model I use when converting natural language instructions to AI.MD format.
+
+### Phase 1: UNDERSTAND — Read Like a Compiler, Not a Human
+
+I read the CLAUDE.md **as if I'm building a state machine**, not reading a document.
+
+For each sentence, I ask:
+1. **Is this a TRIGGER?** (What input activates this behavior?)
+2. **Is this an ACTION?** (What should the AI do?)
+3. **Is this a CONSTRAINT?** (What should the AI NOT do?)
+4. **Is this METADATA?** (Priority, timing, persistence, exceptions?)
+5. **Is this a HUMAN EXPLANATION?** (Why the rule exists — delete this)
+
+Example analysis:
+
+```
+Input: "收到任務→先用一句話複述(防搞混)(長對話中每個新任務都重新觸發) | 例外: signals命中「處理一下」=直接執行"
+
+Decomposition:
+  ├─ TRIGGER:    "收到任務" → new-task
+  ├─ ACTION:     "先用一句話複述" → first-sentence="你要我做的是___"
+  ├─ DELETE:     "(防搞混)" → human motivation, AI doesn't need this
+  ├─ METADATA:   "(長對話中每個新任務都重新觸發)" → persist: every-new-task
+  └─ EXCEPTION:  "例外: signals命中「處理一下」=直接執行" → exception: signal=處理一下 → skip
+```
+
+### Phase 2: DECOMPOSE — Break Every `|` and `()` Into Atomic Rules
+
+The #1 source of compliance failure is **compound rules**.
+A single line with 3 rules separated by `|` looks like 1 instruction to AI.
+It needs to be 3 separate instructions.
+
+**The splitter test:** If you can put "AND" between two parts of a sentence,
+they are separate rules and MUST be on separate lines.
+
+```
+# Input: one sentence hiding 4 rules
+禁用詞:應該是/可能是→先拿數據 | "好像"/"覺得"→自己先跑test(不是問user)→有數據才能決定
+
+# Analysis: I find 4 hidden rules
+Rule 1: certain words are banned → use data instead
+Rule 2: hearing doubt words → run self-test
+Rule 3: don't ask the user for data → look it up yourself
+Rule 4: preference claims → require A/B comparison before accepting
+
+# Output: 4 atomic rules
+banned: 應該是/可能是/感覺是/推測 → 先拿數據
+hear-doubt: "好像"/"覺得" → self-test(curl/benchmark)
+self-serve: 禁反問user(自己查)
+compare: "覺得A比B好" → A/B實測先行
+```
+
+### Phase 3: LABEL — Assign Function Labels
+
+Every atomic rule gets a label that declares its function.
+I use a standard vocabulary of ~12 label types:
+
+| Label | What It Declares | When to Use |
+|-------|-----------------|-------------|
+| `trigger:` | What input activates this | Every gate/rule needs one |
+| `action:` | What the AI must do | The core behavior |
+| `exception:` | When NOT to do it | Override cases |
+| `not-triggered:` | Explicit negative examples | Prevent over-triggering |
+| `format:` | Output format constraint | Position, structure requirements |
+| `priority:` | Override relationship | When rules conflict |
+| `yields-to:` | Which gate takes precedence | Inter-gate priority |
+| `persist:` | Durability across turns | Rules that survive conversation flow |
+| `timing:` | When in the workflow | Before/after/during constraints |
+| `violation:` | Consequence of breaking | Accountability mechanism |
+| `banned:` | Forbidden words/actions | Hard no-go list |
+| `policy:` | Decision heuristic | When judgment is needed |
+
+**The label selection technique:** I pick the label that would help a DIFFERENT AI model
+(not the one being instructed) understand this rule's function if it saw ONLY the label.
+If `trigger:` clearly tells you "this is what activates the rule" without reading anything else,
+it's the right label.
+
+### Phase 4: STRUCTURE — Build the Architecture
+
+I organize rules into a hierarchy:
+
+```
+<gates>    = Hard stops (MUST check before any action)
+<rules>    = Behavioral guidelines (HOW to act)
+<rhythm>   = Workflow patterns (WHEN to do what)
+<conn>     = Connection strings (FACTS — never compress)
+<ref>      = On-demand references (don't load until needed)
+<learn>    = Evolution rules (how the system improves)
+```
+
+**Why this order matters:**
+Gates come first because they MUST be checked before anything else.
+The model processes instructions top-to-bottom. Priority = position.
+
+**Grouping technique:** Rules that share a DOMAIN become sub-items under one heading.
+
+```
+# FLAT (bad): 7 unrelated rules, model treats equally
+1. no guessing
+2. backup before editing
+3. use tables for output
+4. check health after deploy
+5. don't say "應該是"
+6. test before reporting
+7. all claims need proof
+
+# GROUPED (good): 3 domains, model understands hierarchy
+EVIDENCE:               ← domain: truthfulness
+  core: no-guess
+  banned: 應該是
+  proof: all-claims-need-data
+
+SCOPE:                  ← domain: safety
+  pre-change: backup
+  pre-run: check-health
+
+OUTPUT:                 ← domain: format
+  format: tables+numbers
+```
+
+### Phase 5: RESOLVE — Handle Conflicts and Edge Cases
+
+This is the most critical and least obvious phase. Natural language instructions
+often contain **hidden conflicts** that humans resolve with intuition but AI cannot.
+
+**Technique: Conflict Detection Matrix**
+
+I check every pair of gates/rules for conflicts:
+
+```
+GATE-1 (複述: repeat task) vs GATE-3 (保護檔: backup first)
+→ CONFLICT: If user says "edit .env", should AI repeat the task first, or backup first?
+→ RESOLUTION: priority: GATE-3 > GATE-1 (safety before courtesy)
+             yields-to: GATE-3 (explicit in GATE-1)
+
+GATE-4 (報結論: cite evidence) vs bug-close (記錄根因: write root cause)
+→ CONFLICT: bug-close requires stating root cause, but GATE-4 bans definitive claims
+→ RESOLUTION: timing: GATE-4 is pre-conclusion brake; bug-close is post-verification record
+             GATE-4 not-triggered when bug already verified
+
+EVIDENCE (no-guess) vs user says "處理一下" (just do it)
+→ CONFLICT: should AI verify assumptions or execute immediately?
+→ RESOLUTION: signal "處理一下" = user has decided, skip confirmation
+```
+
+**Technique: Not-Triggered Lists**
+
+For any rule that could over-trigger, I add explicit negative examples:
+
+```
+GATE-4 報結論:
+  trigger: 最終歸因/根因判定/不可逆建議
+  not-triggered: 中間進度數字 | 純指標查詢 | 工具原始輸出 | 已知事實 | 轉述文件
+```
+
+This was discovered because Gemini 2.5 Pro kept triggering GATE-4 on simple number queries
+like "成功率怎麼樣?". Adding `not-triggered: 純指標查詢` fixed it immediately.
+
+### Phase 6: TEST — Multi-Model Validation (Non-Negotiable)
+
+**This is not optional.** Every conversion MUST be validated by 2+ different LLM models.
+
+Why? Because a format that works perfectly for Claude might confuse GPT, and vice versa.
+The whole point of AI.MD is that it works ACROSS models.
+
+**The exam protocol we developed:**
+
+1. Write 8 test inputs that simulate REAL user behavior (not textbook examples)
+2. Include "trap" questions where two rules conflict
+3. Include "negative" tests where a rule should NOT trigger
+4. DO NOT hint which rules are being tested (the AI shouldn't know)
+5. Run each model independently
+6. Score each answer: ✅ full compliance, ⚠️ partial, ❌ miss
+7. If ANY model's score drops after conversion → revert that specific change
+
+**The 8-question template we used:**
+
+```
+T1: Simple task (does GATE-1 trigger?)
+T2: Database write attempt (does GATE-2 catch it?)
+T3: Protected file edit (does GATE-3 fire FIRST, before GATE-1?)
+T4: Root cause analysis (does GATE-4 require all 4 questions?)
+T5: Business API addition (does AI mention health-check.py?)
+T6: User says "好像X比Y好" (does AI run comparison or just accept it?)
+T7: User says "處理一下" (does AI skip GATE-1 confirmation?)
+T8: Simple metric query (does GATE-4 NOT trigger?)
+```
+
+---
+
+## Special Techniques Discovered During Battle-Testing
+
+### Technique 1: Bilingual Label Strategy
+
+Labels in English, output strings in the user's language.
+English labels are shorter AND more universally understood by all models.
+But the actual text the AI produces must stay in the user's language.
+
+```
+action: first-sentence="你要我做的是___"    ← AI outputs Chinese
+format: must-be-line-1                      ← structural constraint in English
+banned: 應該是/可能是                        ← forbidden words stay in original language
+```
+
+**Why this works:** English label vocabulary (`trigger`, `action`, `exception`) maps directly
+to concepts in every model's training data. Chinese grammar labels (觸發條件, 執行動作, 例外情況)
+are less standardized across models.
+
+### Technique 2: State Machine Gates
+
+Instead of treating rules as a flat list, model them as a **state machine**:
+- Each gate has a `trigger` (input state)
+- Each gate has an `action` (transition)
+- Gates have `priority` (which fires first when multiple match)
+- Gates have `yields-to` (explicit conflict resolution)
+
+This gives AI a clear execution model:
+```
+Input arrives → Check GATE-3 first (highest priority) → Check GATE-1 → Check GATE-2 → ...
+```
+
+Instead of:
+```
+Input arrives → Read all rules → Try to figure out which one applies → Maybe miss one
+```
+
+### Technique 3: XML Section Tags for Semantic Boundaries
+
+Using `<gates>`, `<rules>`, `<rhythm>`, `<conn>` as section delimiters
+creates hard boundaries that prevent rule-bleed (where the model confuses
+which section a rule belongs to).
+
+```xml
+<gates label="硬性閘門 | 優先序: gates>rules>rhythm | 缺一項=STOP">
+...gates here...
+</gates>
+
+<rules>
+...rules here...
+</rules>
+```
+
+The `label` attribute on the opening tag serves as a section-level instruction:
+"these are hard gates, this is their priority, missing = stop"
+
+### Technique 4: Cross-Reference Instead of Duplicate
+
+When the same concept appears in multiple rules, DON'T repeat it.
+Use a cross-reference label.
+
+```
+# BAD: health-check mentioned in 3 places
+GATE-5: ...check health-check.py...
+MOAT: ...must check health-check.py...
+SCOPE: ...verify health-check.py exists...
+
+# GOOD: single source of truth + cross-reference
+GATE-5 驗收:
+  checks:
+    新增API → 確認health-check.py覆蓋
+
+MOAT:
+  new-api: 必確認health-check.py覆蓋(GATE-5)    ← cross-ref, not duplicate
+```
+
+### Technique 5: The "What Not Why" Principle
+
+Delete ALL text that exists to explain WHY a rule exists.
+AI needs WHAT to do, not WHY.
+
+```
+# DELETE these human explanations:
+(防搞混)                     → motivation
+(不是大爆破,是每次順手一點)    → metaphor
+(想清楚100倍後才做現在的)     → backstory
+(因為用戶是非工程師)          → justification
+
+# KEEP only the actionable instruction:
+action: first-sentence="你要我做的是___"
+refactor: 同區塊連續第3次修改 → extract
+```
+
+Every deleted explanation saves tokens AND removes noise that could confuse the model
+about what it should actually DO.
+
+---
+
+## Two-Stage Workflow
+
+### Stage 1: PREVIEW — Measure, Don't Touch
+
+```bash
+echo "=== Current Token Burn ==="
+claude_md=$(wc -c < ~/.claude/CLAUDE.md 2>/dev/null || echo 0)
+rules=$(cat ~/.claude/rules/*.md 2>/dev/null | wc -c || echo 0)
+total=$((claude_md + rules))
+tokens=$((total / 4))
+echo "CLAUDE.md:     $claude_md bytes"
+echo "rules/*.md:    $rules bytes"
+echo "Total:         $total bytes ≈ $tokens tokens/turn"
+echo "50-turn session: ≈ $((tokens * 50)) tokens on instructions alone"
+```
+
+Then: Read all auto-loaded files. Identify redundancy, prose overhead, and duplicate rules.
+
+**Ask user before proceeding: "Want to distill?"**
+
+### Stage 2: DISTILL — Convert with Safety Net
+
+1. **Backup**: `cp ~/.claude/CLAUDE.md ~/.claude/CLAUDE.md.bak-pre-distill`
+2. **Phase 1-5**: Run the full conversion process above
+3. **Phase 6**: Run multi-model test (minimum 2 models, 8 questions)
+4. **Report**: Show before/after scores
+
+```
+=== AI.MD Conversion Complete ===
+
+Before: {old} bytes ({old_score} compliance)
+After:  {new} bytes ({new_score} compliance)
+Saved:  {percent}% bytes, +{delta} compliance points
+
+Backup: ~/.claude/CLAUDE.md.bak-pre-distill
+Restore: cp ~/.claude/CLAUDE.md.bak-pre-distill ~/.claude/CLAUDE.md
+```
+
+---
+
+## AI-Native Template
+
+```xml
+# PROJECT-NAME | lang:xx | for-AI-parsing | optimize=results-over-format
+
+<user>
+identity, tone, signals, decision-style (key: value pairs)
+</user>
+
+<gates label="硬性閘門 | 優先序: gates>rules>rhythm | 缺一項=STOP">
+
+GATE-1 name:
+  trigger: ...
+  action: ...
+  exception: ...
+  yields-to: ...
+
+GATE-2 name:
+  trigger: ...
+  action: ...
+  policy: ...
+
+</gates>
+
+<rules>
+
+RULE-NAME:
+  core: ...
+  banned: ...
+  hear-X: ... → action
+  violation: ...
+
+</rules>
+
+<rhythm>
+workflow patterns as key: value pairs
+</rhythm>
+
+<conn>
+connection strings (keep exact — NEVER compress facts/credentials/URLs)
+</conn>
+
+<ref label="on-demand Read only">
+file-path → purpose
+</ref>
+
+<learn>
+how system evolves over time
+</learn>
+```
+
+---
+
+## Anti-Patterns
+
+| Don't | Do Instead | Why |
+|-------|------------|-----|
+| Human prose in CLAUDE.md | Structured labels | Prose requires inference; labels are direct |
+| Multiple rules on one line | One concept per line | Attention splits across dense lines |
+| Parenthetical explanations | Remove them | AI needs "what" not "why" |
+| Same rule in 3 places | Single source + cross-ref | Duplicates can diverge and confuse |
+| 20+ flat rules | 5-7 domains with sub-items | Hierarchy helps model organize behavior |
+| Compress without testing | Validate with 2+ models | What works for Claude might fail for GPT |
+| Assume format doesn't matter | Test it — it does | Same content, different format = different compliance |
+| Chinese-only labels | English labels + native output | English labels are more universal across models |
+| Flat rule list | State machine with priorities | Clear execution order prevents missed rules |
+
+---
+
+## Real-World Results
+
+Tested 2026-03, washinmura.jp CLAUDE.md, 5 rounds, 4 models:
+
+| Round | Change | Codex (GPT-5.3) | Gemini 2.5 Pro | Claude Opus 4.6 |
+|-------|--------|-----------------|----------------|-----------------|
+| R1 (baseline prose) | — | 8/8 | 7/8 | 8/8 |
+| R2 (added rules) | +gates +examples | 7/8 | 6/8 | — |
+| R3 (refined prose) | +exceptions +non-triggers | 6/8 | 6.5/8 | — |
+| R4 (AI-native convert) | structured labels | **8/8** | **7/8** | **8/8** |
+
+Key findings:
+1. **More prose rules = worse compliance** (R1→R3: scores dropped as rules grew)
+2. **Structured format = restored + exceeded** (R4: back to max despite more rules)
+3. **Cross-model consistency**: Format that works for one model works for all (except Grok)
+4. **Semantic anchoring**: The `new-api:` label fix was the single most impactful change
+
+**The uncomfortable truth: Your beautiful, carefully-written CLAUDE.md
+might be HURTING your AI's performance. Structure > Prose. Always.**