Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 16 additions & 1 deletion agents/implementer.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,25 @@ Complete the task following spec exactly. Operate with fresh context to avoid dr

## Constraints

**TDD Awareness**: Receive TDD mode: **strict** (must have test, stop if missing), **soft** (warn, proceed), **off** (no checks). When enabled, follow Red-Green-Refactor.
**TDD Awareness**: Receive TDD mode: **strict** (must have test, stop if missing), **soft** (warn, proceed), **off** (no checks). When enabled, follow the vertical TDD workflow below.

Why strict mode stops: Catching spec drift early is cheaper than debugging later.

**TDD Workflow** (when mode is strict or soft):

1. **Tracer Bullet**: Write ONE test for the highest-priority behavior. Verify it FAILS (RED). Write minimal code to pass (GREEN). Run per-cycle checklist. **STOP after tracer bullet passes** — verify the approach is correct before proceeding.
2. **Incremental Loop**: For each remaining behavior: write ONE test (RED), write minimal code to pass (GREEN), run per-cycle checklist. **Do NOT generate all tests first. Write ONE test, make it pass, verify checklist, then write the NEXT test.**
3. **Refactor**: Only when ALL tests are GREEN. One structural change at a time. Run tests after each change.

**Per-Cycle Checklist** (verify after every RED-GREEN pair):
- [ ] **Behavior, not implementation**: Test describes WHAT the system does, not HOW
- [ ] **Public interface only**: Test uses the same API as production callers
- [ ] **Survives refactor**: Would this test break if internals changed but behavior stayed the same?
- [ ] **Minimal code**: Implementation is the simplest thing that passes — no speculative features
- [ ] **No horizontal drift**: Did you write only ONE test before implementing?

**Pre-existing RED state**: If existing tests are already failing when you start, note them and proceed with new behavior only. Do not attempt to fix pre-existing failures unless that is the task.

**Focus**: Implement ONLY what the task requires. No extra features, improvements, or unrelated refactoring.

Why scope discipline matters: Scope drift is the #1 failure mode in subagent workflows. Every unspecified addition introduces untested surface area.
Expand Down
6 changes: 5 additions & 1 deletion commands/implement.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,11 @@ The default implementation approach uses sequential subagent-driven development
### Subagent Workflow
You orchestrate; implementer agents execute. Per task:

1. **Dispatch implementer** (Task tool, `subagent_type: "implementer"`): task spec, TDD mode, context, files to reference
1. **Dispatch implementer** (Task tool, `subagent_type: "implementer"`): task spec, TDD mode, context, files to reference. **When TDD mode is strict or soft**: Before dispatching, read TDD reference files via `Glob("**/tdd/references/*.md", path: "~/.claude/plugins")` and inject key excerpts into the implementer's Task prompt:
- From `mocking.md`: boundary-only rule, system boundary definition, legacy compatibility clause
- From `test-quality.md`: behavioral test criteria (WHAT not HOW, public interface, survives refactor)
- From `interface-design.md`: DI principle (accept dependencies, don't create them)
- Inject as literal text under a `TDD GUIDANCE:` section in the prompt. Do NOT include full files — extract the rules and one example each.
2. **Handle questions**: Answer from plan/context first, else AskUserQuestion
3. **Dispatch spec-reviewer** (`subagent_type: "spec-reviewer"`): original spec + implementation → verify nothing missing/extra, behavior matches
4. **Dispatch code-quality-reviewer** (`subagent_type: "code-quality-reviewer"`): changed files → check bugs/smells/security/anti-patterns
Expand Down
99 changes: 88 additions & 11 deletions skills/tdd/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Correctness > Test Coverage > Implementation Speed

## Goal

Enforce Test-Driven Development based on project configuration. Agents reliably implement happy-path code that breaks on edge cases — TDD makes the test the durable specification across fresh-context dispatches.
Enforce Test-Driven Development as a **process**, not just a presence check. Agents left unconstrained write all tests first, then all code (horizontal slicing). This produces tests coupled to implementation that break on refactor. The 4-phase workflow below enforces vertical TDD cycles where each test drives one slice of implementation.

## Configuration

Expand All @@ -25,29 +25,106 @@ Check both `CLAUDE.md` and `.claude/CLAUDE.md`. Default: `off`.
## Modes

### Strict (`tdd: strict`)
Hard enforcement. Before implementation, verify test exists. If missing, STOP and present AskUserQuestion with options: (1) Write test first, (2) Prototype escape with justification. Log escapes for review.
Full 4-phase workflow with blocking gates. Planning phase requires user confirmation via AskUserQuestion before coding. Each phase has explicit entry/exit criteria.

**Escape when**: Markdown-only changes, config changes, or when mocking exceeds the change's complexity.
**Escape when**: Markdown-only changes, config changes, or when mocking exceeds the change's complexity. Present AskUserQuestion with: (1) Write test first, (2) Prototype escape with justification. Log escapes.

### Soft (`tdd: soft`)
Warning without blocking. Check for test, warn if missing, continue. Summarize untested items after completion.
Full 4-phase workflow guidance but **no blocking gates**. Planning phase generates and presents the plan but proceeds immediately. Warns on deviations (no test, horizontal slicing) but does not stop. Summarize untested items after completion.

### Off (`tdd: off`)
No TDD checks. Standard implementation flow.

## Test Discovery
## Anti-Pattern: Horizontal Slicing

Search patterns: `__tests__/[filename].test.ts`, `[filename].test.ts`, `[filename].spec.ts`, `test/[filename].test.ts`, `tests/[filename].test.ts`.
```
WRONG (horizontal): RIGHT (vertical):
┌──────────────────────┐ ┌──────────────────────┐
│ Write ALL tests │ │ Test 1 → Impl 1 │
│ test1, test2, test3 │ │ ✓ GREEN │
├──────────────────────┤ ├──────────────────────┤
│ Write ALL code │ │ Test 2 → Impl 2 │
│ impl1, impl2, impl3 │ │ ✓ GREEN │
├──────────────────────┤ ├──────────────────────┤
│ Hope they match │ │ Test 3 → Impl 3 │
│ ✗ Tests are brittle │ │ ✓ GREEN │
└──────────────────────┘ └──────────────────────┘
```

Horizontal slicing fails because tests written without implementation are based on imagined APIs. They couple to guessed method signatures, mock internal modules, and break on any structural change.

## 4-Phase Workflow

### Phase 1: Planning

**Entry**: Task received with TDD mode strict or soft.

1. Identify the public interface changes needed
2. List behaviors to test, ordered by priority (core path first, edge cases later)
3. **Strict mode**: Present interface and behavior list via AskUserQuestion for user confirmation before proceeding
4. **Soft mode**: Present the plan, proceed immediately

Load reference: `Glob("**/tdd/references/interface-design.md", path: "~/.claude/plugins")`

**Exit**: Confirmed behavior list. Proceed to Tracer Bullet.

**Non-interactive**: If no user response within the current turn, proceed with best judgment and log skipped confirmation.

### Phase 2: Tracer Bullet

Prove one end-to-end path through the system.

1. Write ONE test for the highest-priority behavior
2. Verify it FAILS (RED) — a passing test before implementation is a false positive
3. Write minimal implementation to make it pass (GREEN)
4. Run per-cycle checklist (below)
5. **Strict mode**: Pause after tracer bullet passes. Present results via AskUserQuestion to confirm before incremental loop.

## Red-Green-Refactor Cycle
Load references: `Glob("**/tdd/references/mocking.md", path: "~/.claude/plugins")`, `Glob("**/tdd/references/test-quality.md", path: "~/.claude/plugins")`

**Red**: Write test describing expected behavior first and verify it fails — a passing test before implementation is a false positive.
**Exit**: One test passing, end-to-end path proven.

**Green**: Write minimal implementation to pass test — the test defines "done", satisfy it, nothing more.
### Phase 3: Incremental Loop

**Refactor**: Clean up with test safety net — the passing test ensures refactoring doesn't break behavior.
For each remaining behavior from the plan:

**Commit**: Test and implementation together as atomic unit.
1. Write ONE test (RED)
2. Write minimal code to pass (GREEN)
3. Run per-cycle checklist
4. Repeat

**Do NOT write the next test until the current cycle is GREEN and the checklist passes.**

### Phase 4: Refactor

Only enter when ALL tests are GREEN.

1. Look for refactoring candidates (duplication, long methods, shallow modules)
2. Make ONE structural change at a time
3. Run tests after each change — must stay GREEN
4. If tests break, revert the refactor

Load reference: `Glob("**/tdd/references/refactoring.md", path: "~/.claude/plugins")`

**Exit**: Clean code, all tests GREEN. Commit test and implementation together.

## Per-Cycle Checklist

After every RED-GREEN pair, verify:

- [ ] **Behavior, not implementation**: Test describes WHAT the system does, not HOW
- [ ] **Public interface only**: Test uses the same API as production callers
- [ ] **Survives refactor**: Would this test break if internals changed but behavior stayed the same?
- [ ] **Minimal code**: Implementation is the simplest thing that passes — no speculative features
- [ ] **No horizontal drift**: Did you write only ONE test before implementing? If you wrote multiple, STOP and revert to one.

## Pre-Existing RED State

If existing tests are already failing when you start: note them and proceed with new behavior only. Do not attempt to fix pre-existing failures unless that is the task.

## Test Discovery

Search patterns: `__tests__/[filename].test.ts`, `[filename].test.ts`, `[filename].spec.ts`, `test/[filename].test.ts`, `tests/[filename].test.ts`.

## Arguments

Expand Down
134 changes: 134 additions & 0 deletions skills/tdd/references/interface-design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
# Interface Design for Testability

## Three Principles

### 1. Accept Dependencies, Don't Create Them

Functions and classes should receive what they need, not reach out for it.

```typescript
// HARD TO TEST: creates its own dependencies
async function processOrder(orderId: string) {
const db = new Database(process.env.DATABASE_URL) // Hidden dependency
const order = await db.orders.findById(orderId)
const result = await fetch('https://api.payment.com/charge', { ... }) // Hidden dependency
return result
}

// TESTABLE: accepts what it needs
async function processOrder(
orderId: string,
orders: OrderRepository,
payments: PaymentGateway
) {
const order = await orders.findById(orderId)
const result = await payments.charge(order.total)
return result
}
```

> **Framework note**: React components accept dependencies via props and Context. Server frameworks use constructor injection (NestJS, Angular) or middleware patterns (Hono, Express). The principle is universal.

### 2. Return Results, Don't Produce Side Effects

Prefer returning values over mutating state or triggering side effects.

```typescript
// SIDE EFFECTS: hard to verify, order-dependent
function validateUser(user: User) {
if (!user.email) {
logger.error('Missing email') // Side effect
metrics.increment('validation.fail') // Side effect
throw new ValidationError('Email required')
}
user.validated = true // Mutation
}

// PURE RETURN: easy to test, composable
function validateUser(user: User): ValidationResult {
if (!user.email) {
return { valid: false, errors: ['Email required'] }
}
return { valid: true, data: { ...user, validated: true } }
}

// Caller handles side effects at the boundary
const result = validateUser(input)
if (!result.valid) {
logger.error('Validation failed', result.errors)
metrics.increment('validation.fail')
}
```

### 3. Small Surface Area

Expose the minimum interface needed. Hide implementation details.

```typescript
// TOO MUCH SURFACE: tests couple to internal structure
class UserCache {
public cache: Map<string, User> = new Map() // Exposed internal
public ttl: number = 300 // Exposed config
public lastCleanup: Date = new Date() // Exposed state

get(id: string): User | undefined { ... }
set(id: string, user: User): void { ... }
cleanup(): void { ... } // Exposed internal operation
}

// MINIMAL SURFACE: tests use the same interface as production code
class UserCache {
constructor(private options: { ttlSeconds: number }) {}

get(id: string): User | undefined { ... }
set(id: string, user: User): void { ... }
// cleanup is internal — triggered automatically, not part of the API
}
```

## Deep Modules

A deep module has a **simple interface** that hides **complex implementation**. This is the key to both good design and testability.

```
┌─────────────────────────────────────────┐
│ Shallow Module (avoid) │
│ │
│ ┌───────────────────────────────────┐ │
│ │ Large, complex interface │ │
│ │ get() set() delete() cleanup() │ │
│ │ configure() validate() reset() │ │
│ └───────────────────────────────────┘ │
│ ┌───────────────────────────────────┐ │
│ │ Thin implementation │ │
│ └───────────────────────────────────┘ │
└─────────────────────────────────────────┘

┌─────────────────────────────────────────┐
│ Deep Module (prefer) │
│ │
│ ┌───────────────────┐ │
│ │ Small interface │ │
│ │ get() set() │ │
│ └───────────────────┘ │
│ ┌───────────────────────────────────┐ │
│ │ │ │
│ │ Rich implementation handles │ │
│ │ caching, validation, cleanup, │ │
│ │ concurrency, error recovery │ │
│ │ │ │
│ └───────────────────────────────────┘ │
└─────────────────────────────────────────┘
```

Deep modules are testable because:
- **Small interface** = few test scenarios to cover
- **Hidden complexity** = implementation can change without breaking tests
- **Clear contract** = tests verify WHAT, not HOW

> Concept from "A Philosophy of Software Design" by John Ousterhout.

## See Also

- `references/mocking.md` — boundary-only mocking patterns using DI
- `references/test-quality.md` — behavioral test criteria
Loading