iamladi · iamladi · Feb 12, 2026 · Feb 12, 2026
diff --git a/agents/implementer.md b/agents/implementer.md
@@ -17,10 +17,25 @@ Complete the task following spec exactly. Operate with fresh context to avoid dr
 
 ## Constraints
 
-**TDD Awareness**: Receive TDD mode: **strict** (must have test, stop if missing), **soft** (warn, proceed), **off** (no checks). When enabled, follow Red-Green-Refactor.
+**TDD Awareness**: Receive TDD mode: **strict** (must have test, stop if missing), **soft** (warn, proceed), **off** (no checks). When enabled, follow the vertical TDD workflow below.
 
 Why strict mode stops: Catching spec drift early is cheaper than debugging later.
 
+**TDD Workflow** (when mode is strict or soft):
+
+1. **Tracer Bullet**: Write ONE test for the highest-priority behavior. Verify it FAILS (RED). Write minimal code to pass (GREEN). Run per-cycle checklist. **STOP after tracer bullet passes** — verify the approach is correct before proceeding.
+2. **Incremental Loop**: For each remaining behavior: write ONE test (RED), write minimal code to pass (GREEN), run per-cycle checklist. **Do NOT generate all tests first. Write ONE test, make it pass, verify checklist, then write the NEXT test.**
+3. **Refactor**: Only when ALL tests are GREEN. One structural change at a time. Run tests after each change.
+
+**Per-Cycle Checklist** (verify after every RED-GREEN pair):
+- [ ] **Behavior, not implementation**: Test describes WHAT the system does, not HOW
+- [ ] **Public interface only**: Test uses the same API as production callers
+- [ ] **Survives refactor**: Would this test break if internals changed but behavior stayed the same?
+- [ ] **Minimal code**: Implementation is the simplest thing that passes — no speculative features
+- [ ] **No horizontal drift**: Did you write only ONE test before implementing?
+
+**Pre-existing RED state**: If existing tests are already failing when you start, note them and proceed with new behavior only. Do not attempt to fix pre-existing failures unless that is the task.
+
 **Focus**: Implement ONLY what the task requires. No extra features, improvements, or unrelated refactoring.
 
 Why scope discipline matters: Scope drift is the #1 failure mode in subagent workflows. Every unspecified addition introduces untested surface area.

diff --git a/commands/implement.md b/commands/implement.md
@@ -55,7 +55,11 @@ The default implementation approach uses sequential subagent-driven development
 ### Subagent Workflow
 You orchestrate; implementer agents execute. Per task:
 
-1. **Dispatch implementer** (Task tool, `subagent_type: "implementer"`): task spec, TDD mode, context, files to reference
+1. **Dispatch implementer** (Task tool, `subagent_type: "implementer"`): task spec, TDD mode, context, files to reference. **When TDD mode is strict or soft**: Before dispatching, read TDD reference files via `Glob("**/tdd/references/*.md", path: "~/.claude/plugins")` and inject key excerpts into the implementer's Task prompt:
+   - From `mocking.md`: boundary-only rule, system boundary definition, legacy compatibility clause
+   - From `test-quality.md`: behavioral test criteria (WHAT not HOW, public interface, survives refactor)
+   - From `interface-design.md`: DI principle (accept dependencies, don't create them)
+   - Inject as literal text under a `TDD GUIDANCE:` section in the prompt. Do NOT include full files — extract the rules and one example each.
 2. **Handle questions**: Answer from plan/context first, else AskUserQuestion
 3. **Dispatch spec-reviewer** (`subagent_type: "spec-reviewer"`): original spec + implementation → verify nothing missing/extra, behavior matches
 4. **Dispatch code-quality-reviewer** (`subagent_type: "code-quality-reviewer"`): changed files → check bugs/smells/security/anti-patterns

diff --git a/skills/tdd/SKILL.md b/skills/tdd/SKILL.md
@@ -11,7 +11,7 @@ Correctness > Test Coverage > Implementation Speed
 
 ## Goal
 
-Enforce Test-Driven Development based on project configuration. Agents reliably implement happy-path code that breaks on edge cases — TDD makes the test the durable specification across fresh-context dispatches.
+Enforce Test-Driven Development as a **process**, not just a presence check. Agents left unconstrained write all tests first, then all code (horizontal slicing). This produces tests coupled to implementation that break on refactor. The 4-phase workflow below enforces vertical TDD cycles where each test drives one slice of implementation.
 
 ## Configuration
 
@@ -25,29 +25,106 @@ Check both `CLAUDE.md` and `.claude/CLAUDE.md`. Default: `off`.
 ## Modes
 
 ### Strict (`tdd: strict`)
-Hard enforcement. Before implementation, verify test exists. If missing, STOP and present AskUserQuestion with options: (1) Write test first, (2) Prototype escape with justification. Log escapes for review.
+Full 4-phase workflow with blocking gates. Planning phase requires user confirmation via AskUserQuestion before coding. Each phase has explicit entry/exit criteria.
 
-**Escape when**: Markdown-only changes, config changes, or when mocking exceeds the change's complexity.
+**Escape when**: Markdown-only changes, config changes, or when mocking exceeds the change's complexity. Present AskUserQuestion with: (1) Write test first, (2) Prototype escape with justification. Log escapes.
 
 ### Soft (`tdd: soft`)
-Warning without blocking. Check for test, warn if missing, continue. Summarize untested items after completion.
+Full 4-phase workflow guidance but **no blocking gates**. Planning phase generates and presents the plan but proceeds immediately. Warns on deviations (no test, horizontal slicing) but does not stop. Summarize untested items after completion.
 
 ### Off (`tdd: off`)
 No TDD checks. Standard implementation flow.
 
-## Test Discovery
+## Anti-Pattern: Horizontal Slicing
 
-Search patterns: `__tests__/[filename].test.ts`, `[filename].test.ts`, `[filename].spec.ts`, `test/[filename].test.ts`, `tests/[filename].test.ts`.
+```
+WRONG (horizontal):               RIGHT (vertical):
+┌──────────────────────┐           ┌──────────────────────┐
+│ Write ALL tests      │           │ Test 1 → Impl 1      │
+│ test1, test2, test3  │           │ ✓ GREEN               │
+├──────────────────────┤           ├──────────────────────┤
+│ Write ALL code       │           │ Test 2 → Impl 2      │
+│ impl1, impl2, impl3 │           │ ✓ GREEN               │
+├──────────────────────┤           ├──────────────────────┤
+│ Hope they match      │           │ Test 3 → Impl 3      │
+│ ✗ Tests are brittle  │           │ ✓ GREEN               │
+└──────────────────────┘           └──────────────────────┘
+```
+
+Horizontal slicing fails because tests written without implementation are based on imagined APIs. They couple to guessed method signatures, mock internal modules, and break on any structural change.
+
+## 4-Phase Workflow
+
+### Phase 1: Planning
+
+**Entry**: Task received with TDD mode strict or soft.
+
+1. Identify the public interface changes needed
+2. List behaviors to test, ordered by priority (core path first, edge cases later)
+3. **Strict mode**: Present interface and behavior list via AskUserQuestion for user confirmation before proceeding
+4. **Soft mode**: Present the plan, proceed immediately
+
+Load reference: `Glob("**/tdd/references/interface-design.md", path: "~/.claude/plugins")`
+
+**Exit**: Confirmed behavior list. Proceed to Tracer Bullet.
+
+**Non-interactive**: If no user response within the current turn, proceed with best judgment and log skipped confirmation.
+
+### Phase 2: Tracer Bullet
+
+Prove one end-to-end path through the system.
+
+1. Write ONE test for the highest-priority behavior
+2. Verify it FAILS (RED) — a passing test before implementation is a false positive
+3. Write minimal implementation to make it pass (GREEN)
+4. Run per-cycle checklist (below)
+5. **Strict mode**: Pause after tracer bullet passes. Present results via AskUserQuestion to confirm before incremental loop.
 
-## Red-Green-Refactor Cycle
+Load references: `Glob("**/tdd/references/mocking.md", path: "~/.claude/plugins")`, `Glob("**/tdd/references/test-quality.md", path: "~/.claude/plugins")`
 
-**Red**: Write test describing expected behavior first and verify it fails — a passing test before implementation is a false positive.
+**Exit**: One test passing, end-to-end path proven.
 
-**Green**: Write minimal implementation to pass test — the test defines "done", satisfy it, nothing more.
+### Phase 3: Incremental Loop
 
-**Refactor**: Clean up with test safety net — the passing test ensures refactoring doesn't break behavior.
+For each remaining behavior from the plan:
 
-**Commit**: Test and implementation together as atomic unit.
+1. Write ONE test (RED)
+2. Write minimal code to pass (GREEN)
+3. Run per-cycle checklist
+4. Repeat
+
+**Do NOT write the next test until the current cycle is GREEN and the checklist passes.**
+
+### Phase 4: Refactor
+
+Only enter when ALL tests are GREEN.
+
+1. Look for refactoring candidates (duplication, long methods, shallow modules)
+2. Make ONE structural change at a time
+3. Run tests after each change — must stay GREEN
+4. If tests break, revert the refactor
+
+Load reference: `Glob("**/tdd/references/refactoring.md", path: "~/.claude/plugins")`
+
+**Exit**: Clean code, all tests GREEN. Commit test and implementation together.
+
+## Per-Cycle Checklist
+
+After every RED-GREEN pair, verify:
+
+- [ ] **Behavior, not implementation**: Test describes WHAT the system does, not HOW
+- [ ] **Public interface only**: Test uses the same API as production callers
+- [ ] **Survives refactor**: Would this test break if internals changed but behavior stayed the same?
+- [ ] **Minimal code**: Implementation is the simplest thing that passes — no speculative features
+- [ ] **No horizontal drift**: Did you write only ONE test before implementing? If you wrote multiple, STOP and revert to one.
+
+## Pre-Existing RED State
+
+If existing tests are already failing when you start: note them and proceed with new behavior only. Do not attempt to fix pre-existing failures unless that is the task.
+
+## Test Discovery
+
+Search patterns: `__tests__/[filename].test.ts`, `[filename].test.ts`, `[filename].spec.ts`, `test/[filename].test.ts`, `tests/[filename].test.ts`.
 
 ## Arguments
 

diff --git a/skills/tdd/references/interface-design.md b/skills/tdd/references/interface-design.md
@@ -0,0 +1,134 @@
+# Interface Design for Testability
+
+## Three Principles
+
+### 1. Accept Dependencies, Don't Create Them
+
+Functions and classes should receive what they need, not reach out for it.
+
+```typescript
+// HARD TO TEST: creates its own dependencies
+async function processOrder(orderId: string) {
+  const db = new Database(process.env.DATABASE_URL)  // Hidden dependency
+  const order = await db.orders.findById(orderId)
+  const result = await fetch('https://api.payment.com/charge', { ... })  // Hidden dependency
+  return result
+}
+
+// TESTABLE: accepts what it needs
+async function processOrder(
+  orderId: string,
+  orders: OrderRepository,
+  payments: PaymentGateway
+) {
+  const order = await orders.findById(orderId)
+  const result = await payments.charge(order.total)
+  return result
+}
+```
+
+> **Framework note**: React components accept dependencies via props and Context. Server frameworks use constructor injection (NestJS, Angular) or middleware patterns (Hono, Express). The principle is universal.
+
+### 2. Return Results, Don't Produce Side Effects
+
+Prefer returning values over mutating state or triggering side effects.
+
+```typescript
+// SIDE EFFECTS: hard to verify, order-dependent
+function validateUser(user: User) {
+  if (!user.email) {
+    logger.error('Missing email')       // Side effect
+    metrics.increment('validation.fail') // Side effect
+    throw new ValidationError('Email required')
+  }
+  user.validated = true  // Mutation
+}
+
+// PURE RETURN: easy to test, composable
+function validateUser(user: User): ValidationResult {
+  if (!user.email) {
+    return { valid: false, errors: ['Email required'] }
+  }
+  return { valid: true, data: { ...user, validated: true } }
+}
+
+// Caller handles side effects at the boundary
+const result = validateUser(input)
+if (!result.valid) {
+  logger.error('Validation failed', result.errors)
+  metrics.increment('validation.fail')
+}
+```
+
+### 3. Small Surface Area
+
+Expose the minimum interface needed. Hide implementation details.
+
+```typescript
+// TOO MUCH SURFACE: tests couple to internal structure
+class UserCache {
+  public cache: Map<string, User> = new Map()  // Exposed internal
+  public ttl: number = 300                       // Exposed config
+  public lastCleanup: Date = new Date()          // Exposed state
+
+  get(id: string): User | undefined { ... }
+  set(id: string, user: User): void { ... }
+  cleanup(): void { ... }                        // Exposed internal operation
+}
+
+// MINIMAL SURFACE: tests use the same interface as production code
+class UserCache {
+  constructor(private options: { ttlSeconds: number }) {}
+
+  get(id: string): User | undefined { ... }
+  set(id: string, user: User): void { ... }
+  // cleanup is internal — triggered automatically, not part of the API
+}
+```
+
+## Deep Modules
+
+A deep module has a **simple interface** that hides **complex implementation**. This is the key to both good design and testability.
+
+```
+┌─────────────────────────────────────────┐
+│         Shallow Module (avoid)          │
+│                                         │
+│  ┌───────────────────────────────────┐  │
+│  │     Large, complex interface      │  │
+│  │  get() set() delete() cleanup()   │  │
+│  │  configure() validate() reset()   │  │
+│  └───────────────────────────────────┘  │
+│  ┌───────────────────────────────────┐  │
+│  │   Thin implementation             │  │
+│  └───────────────────────────────────┘  │
+└─────────────────────────────────────────┘
+
+┌─────────────────────────────────────────┐
+│          Deep Module (prefer)           │
+│                                         │
+│       ┌───────────────────┐             │
+│       │  Small interface  │             │
+│       │   get()  set()    │             │
+│       └───────────────────┘             │
+│  ┌───────────────────────────────────┐  │
+│  │                                   │  │
+│  │   Rich implementation handles     │  │
+│  │   caching, validation, cleanup,   │  │
+│  │   concurrency, error recovery     │  │
+│  │                                   │  │
+│  └───────────────────────────────────┘  │
+└─────────────────────────────────────────┘
+```
+
+Deep modules are testable because:
+- **Small interface** = few test scenarios to cover
+- **Hidden complexity** = implementation can change without breaking tests
+- **Clear contract** = tests verify WHAT, not HOW
+
+> Concept from "A Philosophy of Software Design" by John Ousterhout.
+
+## See Also
+
+- `references/mocking.md` — boundary-only mocking patterns using DI
+- `references/test-quality.md` — behavioral test criteria