amplihack-rs: Rust core runtime for deterministic infrastructure

# amplihack-rs: Rust Core Runtime for Deterministic Infrastructure

---

## ⚡ START HERE (Agent: Read This First)

**What this is:** A complete execution specification for rewriting amplihack's deterministic infrastructure layer from Python to Rust. Read this page section-by-section as you work — don't try to load everything at once.

**Reading order:**
1. This summary (now) — understand scope, rules, and anti-patterns
2. Architecture + Rust Patterns — when starting Phase 0
3. Phase 0 details — when you start Phase 0
4. Phase N details — when you start Phase N (not before)
5. Recipes — execution specs to follow for each component
6. Function Inventory — reference during implementation
7. Appendix — Claude Code hooks protocol reference

### Non-Negotiable Rules

1. **DEFAULT WORKFLOW IS MANDATORY — VIA THE RECIPE RUNNER.** Every code change — no matter how small — MUST be executed through the dev-orchestrator skill: `Skill(skill="dev-orchestrator")`. Do NOT just read the workflow file — invoke the skill, which classifies the task, decomposes into workstreams, and executes via the recipe runner. Work done outside the recipe runner's default workflow WILL BE REJECTED. This is not optional, not "for big changes only," not skippable for "just a quick fix." If you're editing code, you're going through the dev-orchestrator.

2. **NO FALLBACKS. EVER.** No "try Rust, fall back to Python." No auto-mode. No dual-engine runtime. No silent degradation. If Rust is the implementation, it's the ONLY implementation. If it breaks, fix it. If it produces wrong output, that's a bug. Fallbacks create silent failure and garbage code.

3. **CORRECTNESS OVER PERFORMANCE.** Speed is a side effect of Rust's guarantees, not a goal. Every decision should be made for correctness, type safety, and eliminating entire categories of bugs. Don't optimize for benchmarks — optimize for "does this code produce correct output in all cases?"

4. **WORK AUTONOMOUSLY UNTIL CERTAIN.** Work in a continuous loop. Don't stop to ask permission between steps. Only stop when truly blocked on missing information that can't be reasoned about. When you think you're done, triple-check: run all tests, verify all golden files match, dogfood the result.

5. **COMPARISON IS A DEVELOPMENT TOOL, NOT A RUNTIME MODE.** Use `amplihack hooks compare` to compare Python and Rust behavior during development. This is an offline tool you run explicitly. It is NOT a production mode where both engines run simultaneously.

### Anti-Patterns That Will Cause Rejection

- ❌ `AMPLIHACK_HOOK_ENGINE=auto` — NO AUTO MODE
- ❌ `try { rust() } catch { python() }` — NO FALLBACK
- ❌ "If Rust binary not found, use Python" — NO SILENT DEGRADATION
- ❌ Skipping the default workflow for "small" changes
- ❌ Treating performance as the primary goal instead of correctness
- ❌ Swallowing errors with bare `catch` / `except Exception` / `unwrap_or_default()`

### Bootstrap: Setting Up the Dev Environment

```bash
# 1. Clone the repos
git clone https://github.com/rysweet/amplihack  # Python codebase
git clone https://github.com/rysweet/amplihack-rs  # Rust workspace (create this)

# 2. Install Rust toolchain
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
rustup default stable
rustup component add clippy rustfmt

# 3. Install cross-compilation targets
rustup target add x86_64-unknown-linux-gnu
rustup target add aarch64-unknown-linux-gnu
rustup target add x86_64-apple-darwin
rustup target add aarch64-apple-darwin

# 4. Set up Python environment (for running existing tests)
cd amplihack
uv sync  # or: pip install -e ".[dev]"

# 5. Run existing Python tests (baseline — these must keep passing)
pytest tests/ -x --tb=short

# 6. Build Rust workspace
cd amplihack-rs
cargo build
cargo clippy -- -D warnings
cargo test
```

---

## Overview

Split amplihack into two layers: a Rust deterministic infrastructure layer (CLI, hooks) and a Python LLM orchestration layer (proxy, agents, eval, bundles). The Rust layer handles security-critical code where type safety, correctness, and compile-time guarantees matter most. The Python layer handles non-deterministic LLM interactions where flexibility matters most.

> **Note:** XPIA defender is being extracted into its own repo (`amplihack-xpia-defender`) — see issue #2969. It is NOT part of this issue's scope.

> **Note:** Hooks are NOT Claude-Code-specific. They must work with Claude Code, Amplifier, and Copilot. The hook protocol layer must be agent-host-agnostic.

**Boundary rule:** Security-critical + correctness-critical = Rust. Everything else = Python. New subsystems default to Python unless they meet BOTH criteria.

**Scope:** ~18K LOC moves to Rust (hooks + CLI). ~127K LOC stays permanently in Python. XPIA (~1.2K LOC) tracked separately.

## Architecture

```
┌─────────────────────────────────────────────────────────┐
│  PYTHON LAYER (~127K LOC — stays permanently)           │
│  Proxy, agents, eval, bundle gen, knowledge builder     │
│  SDK bridge scripts (thin, embedded at compile time)    │
├─────────────────────────────────────────────────────────┤
│  Subprocess JSON IPC (sole interop mechanism)           │
├─────────────────────────────────────────────────────────┤
│  RUST LAYER (~18K LOC — new)                            │
│  CLI + launcher, hooks (Claude Code + Amplifier + Copilot) │
│  Recipe runner, log parser (already exist)              │
└─────────────────────────────────────────────────────────┘
│                                                         │
│  SEPARATE REPO: amplihack-xpia-defender                 │
│  XPIA patterns, defender, health (~1.2K LOC)            │
└─────────────────────────────────────────────────────────┘
```

### Interop: Two Patterns Only

| Pattern | When Used | Overhead |
|---------|-----------|----------|
| **Standalone binary** | CLI, hooks (Claude Code / Amplifier / Copilot invoke directly) | ~1-2ms startup |
| **Subprocess JSON IPC** | SDK bridge (Rust calls Python for Claude SDK) | ~50ms per call |

No PyO3. No gRPC. No FFI. Subprocess JSON is simple, debuggable, and already proven by amplihack-recipe-runner.

## Crate Structure (5 crates)

> **Note:** Hooks are host-agnostic. They process JSON stdin/stdout and work with Claude Code, Amplifier, and Copilot. The protocol layer does NOT assume any specific host.

```
amplihack-rs/
├── Cargo.toml                          # Workspace
├── crates/
│   ├── amplihack-types/                # THIN: IPC boundary types only (~200 LOC)
│   │   ├── src/hook_io.rs              # HookInput, HookOutput (serde) — host-agnostic
│   │   ├── src/tool_decision.rs        # Allow/Deny/Ask enum
│   │   └── src/settings.rs             # Settings struct
│   │
│   ├── amplihack-state/                # File ops, locking, env, Python bridge
│   │   ├── src/atomic_json.rs          # AtomicJsonFile<T> (temp+rename)
│   │   ├── src/file_lock.rs            # Timeout-based locking (F_SETLK)
│   │   ├── src/semaphore.rs            # TOCTOU-safe flags (O_CREAT|O_EXCL)
│   │   ├── src/counter.rs              # Atomic counter files
│   │   ├── src/env_config.rs           # EnvVar<T> typed parsing
│   │   └── src/python_bridge.rs        # spawn_python() with timeout
│   │
│   ├── amplihack-hooks/                # Protocol + all hook implementations
│   │   ├── src/protocol.rs             # run_hook(), panic handler, SIGPIPE
│   │   ├── src/pre_tool_use/           # CWD, branch, command, path safety
│   │   ├── src/stop/                   # Lock mode, power steering
│   │   ├── src/session_start.rs
│   │   ├── src/user_prompt.rs
│   │   ├── src/post_tool_use.rs
│   │   └── src/pre_compact.rs
│   │
│   │
│   └── amplihack-cli/                  # CLI + launcher (merged)
│       ├── src/commands/               # clap derive, exhaustive enum
│       ├── src/launcher.rs             # ManagedChild (fixed Drop)
│       ├── src/signals.rs              # SIGINT+SIGTERM+SIGHUP, setpgid
│       ├── src/binary_finder.rs
│       ├── src/env_builder.rs          # Type-safe PATH construction
│       └── src/settings_manager.rs     # Atomic settings.json CRUD
│
├── bins/
│   ├── amplihack/main.rs               # CLI binary
│   └── amplihack-hooks/main.rs         # Multicall hook binary
│       (dispatches: pre-tool-use, stop, session-start, etc.)
│
├── bridge/                             # Embedded Python bridge scripts
│   ├── claude_reflect.py               # Compiled into binary via include_str!
│   └── memory_store.py
│
└── tests/
    ├── golden/                         # Captured stdin/stdout pairs
    ├── parity/                         # Python vs Rust comparison
    ├── proptest/                       # Property-based + fuzz
    └── integration/
```

### Key Design Decisions

**Multicall hook binary:** Single `amplihack-hooks` binary with subcommand dispatch instead of 7 separate binaries. Saves ~40% binary size (shared std lib, serde). Registered as:
```json
{"command": "/path/to/amplihack-hooks pre-tool-use", "timeout": 10}
```

**Thin types crate:** Only IPC boundary types live here. Domain types (power steering state) live in their domain crates. Prevents "god types" coupling magnet.

**Embedded bridge scripts:** Python SDK bridge scripts compiled into the binary via `include_str!()`, written to temp file with 0o600 permissions at runtime. Prevents PATH injection / script replacement.

**Configurable fail policy:** Per-hook failure policy:
```rust
enum FailurePolicy { Open, Closed }
// pre_tool_use, session_start, etc. → Open (don't break the user)
// Security-critical hooks → Closed (reject on error)
```

## Prior Art: amplihack-recipe-runner

The `amplihack-recipe-runner` repo (PR #2951) was the first Rust component. It proved that **subprocess JSON IPC** works as an interop mechanism — reuse this pattern.

**What NOT to copy from amplihack-recipe-runner:**
- Any pattern where failure is silently swallowed — errors must be visible

amplihack-recipe-runner was a first attempt and should NOT be treated as a style guide. The Rust patterns section below (reviewed by 5 independent reviewers including a senior Rust engineer) supersedes anything in that codebase. Follow idiomatic Rust and the patterns documented here.

## Rust Patterns

### Panic Handler
```rust
// Cargo.toml: panic = "unwind" (NOT abort — need catch_unwind)
pub fn run_hook<H: Hook>(hook: H) {
    std::panic::set_hook(Box::new(|_| {})); // suppress stderr
    let result = std::panic::catch_unwind(AssertUnwindSafe(|| {
        let input = read_stdin()?;
        hook.process(input).and_then(write_stdout)
    }));
    match result {
        Ok(Ok(())) => {}
        _ => { let _ = io::stdout().write_all(b"{}"); } // pre-baked, no alloc
    }
}
```

### AtomicJsonFile (panic-safe temp cleanup)
```rust
pub fn update<F>(&self, f: F) -> Result<T>
where F: FnOnce(&mut T) -> Result<()>, T: Serialize + DeserializeOwned + Clone
{
    let _lock = FileLock::exclusive(&self.lock_path, Duration::from_secs(5))?;
    let mut data = self.read()?;
    f(&mut data)?;  // if panics: lock drops, no temp file exists yet
    let temp = NamedTempFile::new_in(self.path.parent().unwrap_or(Path::new(".")))?;
    serde_json::to_writer_pretty(temp.as_file(), &data)?;
    temp.persist(&self.path)?;  // atomic rename
    Ok(data)
}
```

### ManagedChild Drop (bounded, never blocks forever)
```rust
impl Drop for ManagedChild {
    fn drop(&mut self) {
        if matches!(self.child.try_wait(), Ok(Some(_))) { return; }
        #[cfg(unix)]
        unsafe { libc::kill(self.child.id() as i32, libc::SIGTERM); }
        let deadline = Instant::now() + Duration::from_secs(3);
        while Instant::now() < deadline {
            if matches!(self.child.try_wait(), Ok(Some(_))) { return; }
            thread::sleep(Duration::from_millis(50));
        }
        let _ = self.child.kill();
        let _ = self.child.wait();
    }
}
```

### Signal Handling (all signals, child in own pgroup)
```rust
for sig in [SIGINT, SIGTERM, SIGHUP] {
    signal_hook::flag::register(sig, Arc::clone(&shutdown))?;
}
#[cfg(unix)]
let child = Command::new(&bin)
    .pre_exec(|| { unsafe { libc::setpgid(0, 0); } Ok(()) })
    .spawn()?;
// Main loop polls shutdown flag + child.try_wait()
```

### Hook Input (forward-compatible deserialization)
```rust
#[derive(Deserialize)]
#[serde(tag = "hook_event_name")]
enum HookInput {
    #[serde(rename = "PreToolUse")]
    PreToolUse { tool_name: String, tool_input: Value },
    #[serde(rename = "Stop")]
    Stop { transcript_path: Option<PathBuf> },
    // ... known events ...
    #[serde(other)]
    Unknown,  // Future Claude Code events → graceful no-op
}
```

### Error Strategy
- **Library crates** (`types`, `state`): `thiserror` custom errors
- **Binary crates** (`cli`, `hooks`): `anyhow` with `.context()`
- **Never** `Box<dyn Error>` — worst of both worlds

### Shell Command Parsing
```rust
// shell-words for tokenization + manual command separator splitting
fn split_commands(input: &str) -> Vec<Vec<String>> {
    let tokens = shell_words::split(input).unwrap_or_default();
    tokens.split(|t| ["&&", "||", ";", "|"].contains(&t.as_str()))
        .map(|cmd| cmd.to_vec())
        .filter(|cmd| !cmd.is_empty())
        .collect()
}
```

### File Locks
`F_SETLK` (non-blocking) + retry with configurable timeout. Never `F_SETLKW` (blocks indefinitely). Check holder PID liveness, break if dead.

## Implementation Phases

### Phase 0: Foundation + Validation (3 weeks)

**0a. Python experiment (2 days)** — DECISION GATE for Phase 2
- Run `mypy --strict` on hooks + CLI — how many type errors? How many are real bugs vs annotation gaps?
- Add lazy imports to cli.py entry point
- Assess: does mypy catch the class of bugs we're worried about? If strict typing + lazy imports addresses the CLI correctness gaps, defer Phase 2 indefinitely

**0b. Golden file capture (1 week)**
- **Do NOT rely on "running real sessions" for golden files.** Instead, generate synthetic inputs:
  - Read each hook's Python source to enumerate all code paths
  - For each code path, craft a minimal JSON input that exercises it
  - Include: happy path, error cases, edge cases (empty fields, missing fields, unicode, very long strings)
  - For pre_tool_use: all tool types (Bash, Read, Write, Edit, etc.), all protection checks (CWD, branch, command safety)
  - For stop: lock mode on/off, power steering states, reflection needed/not needed
- Capture ≥100 pairs per hook type (pre_tool_use needs ≥200)
- Run each synthetic input through Python hooks, capture stdout as expected output
- Flag golden files where Python behavior is wrong — fix Python bugs FIRST, then re-capture
- Build semantic JSON comparator (not string diff — handle float precision, whitespace, key ordering, null vs absent)
- Store in `tests/golden/{hook_type}/{test_name}.input.json` + `.expected.json`

**0c. Python test coverage uplift (1 week)**
- Target: 60% on hooks (realistic baseline)
- Focus on security-critical paths in pre_tool_use and stop
- Focus on error paths (the `except Exception` blocks)

**0d. Workspace scaffolding (3 days)**
- Cargo workspace with 6 crates
- CI: `cargo clippy -D warnings`, `cargo fmt --check`, `cargo test`
- Cross-compilation targets (4 platforms)
- IPC version protocol: every JSON message gets `{"version": 1, ...}`
- `CONTRIBUTING_RUST.md` with patterns and build instructions

**EXIT CRITERIA:** Golden files captured, Python coverage ≥60% on hooks, workspace builds, at least 2 team members have completed Rust basics.

### Phase 1: pre_tool_use Hook (3 weeks) — HIGHEST VALUE

**1a. amplihack-types + amplihack-state (1 week)**
- Thin boundary types (HookInput, ToolDecision, Settings)
- AtomicJsonFile with panic-safe temp cleanup
- FileLock with timeout (F_SETLK + retry)
- AtomicFlag (O_CREAT|O_EXCL)
- EnvVar<T> typed parsing
- Python bridge module (spawn_python, timeout, embedded scripts)

**1b. amplihack-hooks: protocol + pre_tool_use engine (1.5 weeks)**
- `run_hook()` with `catch_unwind`, pre-baked `b"{}"`
- Forward-compatible serde deserialization
- Configurable FailurePolicy (Open for pre_tool_use)
- CWD protection, main branch protection
- Command analysis: shell-words + separator splitting
- Path safety: `Path::canonicalize()`, no string concat

**1c. Deploy + validate (3 days)**
- Multicall binary: `amplihack-hooks pre-tool-use`
- Hook registration: `AMPLIHACK_HOOK_ENGINE=rust` enables Rust hooks. No auto mode. No fallback. If Rust hook fails, it fails — fix it.
- Parity tests: ≥200 golden files, semantic JSON comparison
- Telemetry: stderr JSON line per invocation (hook, duration_us, result)

**EXIT CRITERIA:** ≥95% golden file parity, zero false denies on production traffic for 1 week.

**DECISION GATE:** Measure real-world impact. If correctness gain is negligible and developer velocity dropped, STOP HERE. The project has still delivered value (one fast, type-safe security hook).

### Phase 2: CLI + Launcher (3 weeks) — CONDITIONAL

> **XPIA Defender** is tracked separately in its own issue/repo (`amplihack-xpia-defender`). It is NOT part of this work.

**GATE:** Phase 0a must show Python startup >500ms even with lazy imports. If startup is acceptable in Python, skip this phase.

**2a. CLI command parsing (1 week)**
- clap derive with exhaustive Command enum
- Nested subcommands for plugin/recipe/memory
- `amplihack memory` delegates to Python subprocess

**2b. Launcher + process management (1.5 weeks)**
- ManagedChild with corrected Drop (bounded, never blocks)
- Signal handling: SIGINT + SIGTERM + SIGHUP via AtomicBool
- Child in own process group (setpgid) to prevent double-SIGINT
- Binary finder with version verification
- Atomic settings.json via AtomicJsonFile
- Type-safe env builder (set-based PATH, not substring matching)

**2c. Nesting detection (3 days)**
- Primary: environment variable marker (`AMPLIHACK_SESSION_ID`)
- Secondary: session file with PID + timestamp
- Stale session detection: if PID dead AND session >1h old, ignore

**EXIT CRITERIA:** Drop-in replacement. No behavior change vs Python on golden test suite. All type-safety bugs from the Python analysis (TOCTOU races, string dispatch, missing validation) proven eliminated by Rust's type system.

> **Note:** Phase "Remaining Hooks" from previous versions of this plan has been merged into Phase 1. All hooks (Tier 1 + Tier 2 + Tier 3) are implemented in Phase 1. See "Work Decomposition" section for the tier breakdown.

## Security Controls (Integrated Per Phase)

| Control | Phase | Implementation |
|---------|-------|---------------|
| Configurable fail-open/closed | 1 | FailurePolicy enum per hook |
| Bridge script embedding | 1 | include_str!() at compile time |
| Settings.json integrity check | 2 | Verify hook paths at launch |
| Binary signing | 2 | cosign in CI for releases |
| Telemetry (every decision logged) | 1 | Stderr JSON, one line per invocation |
| Stale lock detection | 1 | Check holder PID liveness, break if dead |

**Deferred to post-v1:** seccomp sandboxing, audit logging with tamper detection, rate limiting, hook chain integrity verification.

## Python Function Inventory (What Gets Replaced)

Every function below must have a Rust equivalent that produces identical output for identical input. This is the exhaustive checklist.

### Hook System (~2,730 LOC total)

#### `hook_processor.py` (407 LOC) → `amplihack-hooks/src/protocol.rs`
| Function | Lines | LLM Calls | Notes |
|----------|-------|-----------|-------|
| `HookProcessor.__init__(hook_name)` | 45-101 | No | Project root detection, log setup |
| `HookProcessor.validate_path_containment(path)` | 103-121 | No | Path stays within project |
| `HookProcessor.log(message, level)` | 123-144 | No | Log rotation at 10MB |
| `HookProcessor.read_input()` | 146-172 | No | JSON from stdin |
| `HookProcessor.write_output(output)` | 174-205 | No | JSON to stdout, SIGPIPE handling |
| `HookProcessor.save_metric(name, value, metadata)` | 207-231 | No | JSONL metrics file |
| `HookProcessor.run()` | 274-368 | No | Full lifecycle with error handling |
| `HookProcessor.get_session_id()` | 370-377 | No | Timestamp-based session ID |
| `HookProcessor.save_session_data(filename, data)` | 379-406 | No | Session file with path validation |

#### `pre_tool_use.py` (492 LOC) → `amplihack-hooks/src/pre_tool_use/`
| Function | Lines | LLM Calls | Notes |
|----------|-------|-----------|-------|
| `PreToolUseHook.process(input_data)` | 95-213 | No | Main entry — validates bash commands |
| `PreToolUseHook._check_cwd_deletion(command)` | 215-269 | No | Detects CWD deletion |
| `PreToolUseHook._check_cwd_rename(command)` | 271-359 | No | Detects CWD rename/move, glob patterns |
| `PreToolUseHook._extract_rm_paths(segment)` | 361-391 | No | shlex.split() path extraction |
| `PreToolUseHook._extract_mv_source_paths(segment)` | 393-463 | No | mv -t/--target-directory parsing |
| `PreToolUseHook._select_strategy()` | 465-481 | No | Launcher detection (Claude vs Copilot) |

#### `stop.py` (902 LOC) → `amplihack-hooks/src/stop/`
| Function | Lines | LLM Calls | Notes |
|----------|-------|-----------|-------|
| `StopHook.process(input_data)` | 67-358 | No | Lock mode, safety valve (Issue #2874) |
| `StopHook.read_continuation_prompt()` | 387-430 | No | Custom prompt from file or default |
| `StopHook._increment_power_steering_counter(session_id)` | 432-498 | No | Atomic counter file |
| `StopHook._increment_lock_counter(session_id)` | 499-555 | No | Safety valve counter |
| `StopHook._should_run_power_steering()` | 557-593 | No | Config check |
| `StopHook._should_run_reflection()` | 595-640 | No | Config check |
| `StopHook._get_current_session_id()` | 642-673 | No | Env var or filesystem |
| `StopHook._run_reflection_sync(transcript_path)` | 675-782 | **YES** | SDK bridge needed (subprocess to Python) |
| `StopHook._announce_reflection_start()` | 784-796 | No | Stderr output |
| `StopHook._generate_reflection_filename(template)` | 798-823 | No | Filename from content |
| `StopHook._block_with_findings(template, path)` | 825-868 | No | Block decision JSON |

#### `session_start.py` (601 LOC) → `amplihack-hooks/src/session_start.rs`
| Function | Lines | LLM Calls | Notes |
|----------|-------|-----------|-------|
| `SessionStartHook.process(input_data)` | 41-591 | **YES** | Version check, memory, staging. SDK bridge for memory calls. |
| `SessionStartHook._check_version_mismatch()` | 386-539 | No | Version comparison, auto-update trigger |
| `SessionStartHook._migrate_global_hooks()` | 541-592 | No | Duplicate hook migration |

#### `session_stop.py` (86 LOC) → `amplihack-hooks/src/session_stop.rs`
| Function | Lines | LLM Calls | Notes |
|----------|-------|-----------|-------|
| `main()` | 29-82 | **YES** | MemoryCoordinator.store() — SDK bridge |

#### `post_tool_use.py` (203 LOC) → `amplihack-hooks/src/post_tool_use.rs`
| Function | Lines | LLM Calls | Notes |
|----------|-------|-----------|-------|
| `PostToolUseHook.process(input_data)` | 82-173 | No | Tool metrics, validation |
| `PostToolUseHook._setup_tool_hooks()` | 34-67 | No | Hook registry setup |
| `PostToolUseHook.save_tool_metric(tool_name, duration_ms)` | 69-80 | No | JSONL metric |

#### `user_prompt_submit.py` (451 LOC) → `amplihack-hooks/src/user_prompt.rs`
| Function | Lines | LLM Calls | Notes |
|----------|-------|-----------|-------|
| `UserPromptSubmitHook.process(input_data)` | 257-400 | **YES** | Memory injection — SDK bridge for inject_memory_for_agents_sync() |
| `UserPromptSubmitHook.find_user_preferences()` | 36-56 | No | File discovery |
| `UserPromptSubmitHook.extract_preferences(content)` | 58-99 | No | Markdown parsing |
| `UserPromptSubmitHook.build_preference_context(prefs)` | 101-156 | No | Context string builder |
| `UserPromptSubmitHook.get_cached_preferences(path)` | 158-184 | No | mtime-based cache |
| `UserPromptSubmitHook._inject_amplihack_if_different()` | 186-255 | No | File diff with mtime cache |

#### `pre_compact.py` (195 LOC) → `amplihack-hooks/src/pre_compact.rs`
| Function | Lines | LLM Calls | Notes |
|----------|-------|-----------|-------|
| `PreCompactHook.process(input_data)` | 33-137 | No | Transcript export |
| `PreCompactHook.restore_conversation_from_latest()` | 139-184 | No | Transcript restore |

### ~~XPIA Security System~~ → Tracked separately in `amplihack-xpia-defender`

> See the XPIA issue for full function inventory, patterns, and implementation plan.

### Settings System (341 LOC) → `amplihack-state/` or `amplihack-cli/`

| Function | Lines | LLM Calls | Notes |
|----------|-------|-----------|-------|
| `validate_hook_paths(hook_system, hooks, dir)` | 113-135 | No | File existence check |
| `update_hook_paths(settings, system, hooks, dir)` | 138-225 | No | JSON path rewriting |
| `ensure_settings_json()` | 228-337 | No | Atomic settings creation |

### Summary

| Category | Total LOC | Functions w/ LLM Calls | Pure Rust | SDK Bridge |
|----------|-----------|----------------------|-----------|------------|
| Hook processor | 407 | 0 | 9 | 0 |
| pre_tool_use | 492 | 0 | 6 | 0 |
| stop | 902 | 1 | 14 | 1 (reflection) |
| session_start | 601 | 1 | 3 | 1 (memory) |
| session_stop | 86 | 1 | 0 | 1 (memory) |
| post_tool_use | 203 | 0 | 3 | 0 |
| user_prompt | 451 | 1 | 6 | 1 (memory) |
| pre_compact | 195 | 0 | 2 | 0 |
| Settings | 341 | 0 | 3 | 0 |
| **Total** | **~3,678** | **4** | **46** | **4** |

> **Note:** XPIA (~1,200 LOC) is tracked separately in `amplihack-xpia-defender`.

Only 4 functions require SDK bridge calls. Everything else is pure data processing → pure Rust.

### SDK Bridge Contracts (4 Functions)

Each SDK bridge function runs an embedded Python script via subprocess. The Rust side writes JSON to stdin, reads JSON from stdout, with a hard timeout.

#### 1. `_run_reflection_sync()` (stop hook)
```
Input JSON (stdin):
{
  "action": "reflect",
  "transcript_path": "/path/to/transcript.jsonl",
  "session_id": "abc-123"
}

Output JSON (stdout):
{
  "should_block": true|false,
  "findings": ["finding 1", "finding 2"],
  "severity": "low"|"medium"|"high"
}

Timeout: 30s (reflection can be slow — Claude API call)
On timeout: treat as should_block=false (fail-open for UX, log warning)
On invalid JSON: ERROR — do not swallow
```

#### 2. `SessionStartHook` memory calls
```
Input JSON:
{
  "action": "get_context",
  "session_id": "abc-123",
  "project_path": "/path/to/project"
}

Output JSON:
{
  "context": "string of context to inject",
  "memories": [{"key": "...", "value": "..."}]
}

Timeout: 10s
On timeout: ERROR (memory is required for correct operation)
```

#### 3. `session_stop` memory store
```
Input JSON:
{
  "action": "store",
  "session_id": "abc-123",
  "transcript_path": "/path/to/transcript.jsonl"
}

Output JSON:
{
  "stored": true,
  "memories_count": 5
}

Timeout: 15s
On timeout: log ERROR (data loss) but don't block session exit
```

#### 4. `inject_memory_for_agents_sync()` (user_prompt_submit)
```
Input JSON:
{
  "action": "inject_memory",
  "session_id": "abc-123",
  "prompt": "user prompt text"
}

Output JSON:
{
  "injected_context": "string to prepend to prompt",
  "memory_keys_used": ["key1", "key2"]
}

Timeout: 5s
On timeout: ERROR — do not silently drop memory injection
```

**Implementation pattern** (same for all 4):
```rust
fn call_bridge(script: &str, input: &Value) -> Result<Value> {
    let script_path = write_embedded_script(script)?;  // 0o600 permissions
    let output = Command::new("python3")
        .arg(&script_path)
        .stdin(Stdio::piped())
        .stdout(Stdio::piped())
        .stderr(Stdio::piped())
        .spawn()?;
    // Write input, read output, enforce timeout
    // Parse JSON — error if invalid, NEVER silently default
    // Clean up script_path
}
```

## Behavioral Baselines

Before writing Rust, capture Python behavioral baselines. The primary purpose is **correctness verification**, not speed optimization. Performance improvement is a welcome side effect but NOT the goal.

```bash
# Capture during Phase 0b golden file collection
# For each hook invocation, record:
{
  "hook": "pre_tool_use",
  "input": { ... },               // ← full JSON input
  "python_output": { ... },       // ← full JSON output
  "python_exit_code": 0,
  "python_stderr": "",
  "python_duration_us": 145000,   // ← informational, not a target
  "input_size_bytes": 342,
  "timestamp": "2026-03-09T..."
}
```

The agent MUST capture actual behavioral baselines in Phase 0b. These become the correctness oracle — Rust output must match Python output for every recorded invocation.

## Distribution, Migration & Release Pipeline

### Binary Sizes
```
amplihack          (~3MB stripped, LTO)
amplihack-hooks    (~2MB stripped, LTO)
Total: ~5MB
```

### Dual-Wheel Strategy

Publish TWO types of wheels:

1. **Pure-Python wheel** (`amplihack-X.Y.Z-py3-none-any.whl`) — no Rust binaries, hooks remain Python scripts. This is the pre-migration state, not a "fallback."
2. **Platform wheels** (`amplihack-X.Y.Z-py3-none-manylinux_2_28_x86_64.whl`, etc.) — contains Rust binaries in `amplihack/bin/`, hooks are Rust binaries.

pip and UV automatically prefer platform-specific wheels when available. Once a user has the platform wheel, hooks are Rust. There is no "auto" mode — the wheel either has the binaries or it doesn't.

### CI/CD Pipeline: `rust-build.yml`

```yaml
name: Build Rust Binaries + Platform Wheels

on:
  push:
    branches: [main]
    paths: ['amplihack-rs/**']
  release:
    types: [published]

jobs:
  build-rust:
    strategy:
      matrix:
        include:
          - target: x86_64-unknown-linux-gnu
            os: ubuntu-latest
            wheel_plat: manylinux_2_28_x86_64
          - target: aarch64-unknown-linux-gnu
            os: ubuntu-latest
            wheel_plat: manylinux_2_28_aarch64
          - target: x86_64-apple-darwin
            os: macos-13
            wheel_plat: macosx_13_0_x86_64
          - target: aarch64-apple-darwin
            os: macos-14
            wheel_plat: macosx_14_0_arm64
    steps:
      - uses: actions/checkout@v4
      - uses: dtolnay/rust-toolchain@stable
        with:
          targets: ${{ matrix.target }}
      - name: Build + strip Rust binaries
        run: |
          cd amplihack-rs
          cargo build --release --target ${{ matrix.target }}
          strip target/${{ matrix.target }}/release/amplihack
          strip target/${{ matrix.target }}/release/amplihack-hooks
      - uses: actions/upload-artifact@v4
        with:
          name: rust-${{ matrix.target }}
          path: |
            amplihack-rs/target/${{ matrix.target }}/release/amplihack
            amplihack-rs/target/${{ matrix.target }}/release/amplihack-hooks

  build-platform-wheels:
    needs: build-rust
    strategy:
      matrix:
        include:
          - target: x86_64-unknown-linux-gnu
            wheel_plat: manylinux_2_28_x86_64
          - target: aarch64-unknown-linux-gnu
            wheel_plat: manylinux_2_28_aarch64
          - target: x86_64-apple-darwin
            wheel_plat: macosx_13_0_x86_64
          - target: aarch64-apple-darwin
            wheel_plat: macosx_14_0_arm64
    steps:
      - uses: actions/checkout@v4
      - uses: actions/download-artifact@v4
        with:
          name: rust-${{ matrix.target }}
          path: src/amplihack/bin/
      - run: python -m build --wheel

  build-pure-wheel:
    steps:
      - uses: actions/checkout@v4
      - run: python -m build --wheel
```

### Changes to `build_hooks.py`

```python
RUST_BIN_DIR = Path("src/amplihack/bin")

def _copy_rust_binaries(self):
    """Copy pre-built Rust binaries into package if available."""
    if not RUST_BIN_DIR.exists():
        return  # Pure-Python wheel
    for binary in RUST_BIN_DIR.iterdir():
        if binary.is_file() and os.access(binary, os.X_OK):
            dst = Path("src/amplihack/bin") / binary.name
            shutil.copy2(str(binary), str(dst))
            os.chmod(str(dst), 0o755)
```

Add to `pyproject.toml`:
```toml
[tool.setuptools.package-data]
amplihack = ["bin/*"]
```

### Binary Discovery at Runtime

```python
# src/amplihack/binary_discovery.py
def find_rust_binary(name: str) -> Path | None:
    """Discover Rust binary location, in priority order."""
    candidates = [
        os.environ.get(f"AMPLIHACK_{name.upper().replace('-', '_')}_BIN"),  # 1. Env var override
        Path.home() / ".amplihack" / ".claude" / "bin" / name,              # 2. Staged (UVX)
        Path(__file__).parent / "bin" / name,                               # 3. Package (wheel)
        _find_cargo_build(name),                                            # 4. Cargo build (dev)
        Path.home() / ".cargo" / "bin" / name,                              # 5. Cargo install
        shutil.which(name),                                                 # 6. System PATH
    ]
    for c in candidates:
        if c and Path(c).is_file() and os.access(c, os.X_OK):
            return Path(c)
    return None
```

### Hook Path Migration

**Current** `settings.json`:
```json
{"command": "/home/user/.amplihack/.claude/tools/amplihack/hooks/pre_tool_use.py"}
```

**After migration:**
```json
{"command": "/home/user/.amplihack/.claude/bin/amplihack-hooks pre-tool-use"}
```

Migration extends `settings.py` `update_hook_paths()` with a `HOOK_REGISTRY` mapping. When Rust binaries are present in the wheel, `amplihack install` registers them. When they're not (pure-Python wheel), Python hooks remain registered. No auto-detection, no fallback — the install step writes the correct paths based on what's in the package.

### UVX Staging

Add `"bin"` to `ESSENTIAL_DIRS` in `cli.py` so UVX staging copies Rust binaries to `~/.amplihack/.claude/bin/`.

### Version Synchronization

Single version number sourced from `pyproject.toml`, synced to `Cargo.toml` by CI (in `auto-version-on-merge.yml`).

### User Upgrade Scenarios

| Scenario | What Happens | User Action |
|----------|-------------|-------------|
| `uv tool upgrade amplihack` | UV downloads platform wheel with Rust binaries. Next session re-stages and rewrites hook paths. | None |
| `uvx` (zero-install) | Fresh temp venv from latest wheel. Staging copies binaries. | None |
| `git pull` (editable install) | Gets Cargo workspace. Must build Rust. Next `amplihack install` discovers binaries. | `cargo build --release` |
| Unsupported platform | Pure-Python wheel installed. Hooks remain Python scripts. User is on the pre-migration path. | None |
| Rust binary crashes | Hook fails. Claude Code sees exit code != 0, stderr has error. Fix the bug. | Report bug, fix in next release |

### macOS Code Signing

Start with ad-hoc signing (`codesign --force --sign -`) in Phase 2. Add full notarization ($99/yr Apple Developer account) when user complaints about Gatekeeper warrant the cost.

**Explicitly unsupported:** Native Windows. WSL works via linux-x64.

## Agent Execution Protocol

### ⚠️ MANDATORY: Default Workflow via Recipe Runner for ALL Code Changes

**Every code change MUST be executed through the dev-orchestrator skill:** `Skill(skill="dev-orchestrator")`. This invokes the recipe runner which handles the 25-step default workflow automatically — task classification, workstream decomposition, implementation, testing, and review.

**Do NOT just read `DEFAULT_WORKFLOW.md` and manually follow steps.** Invoke the dev-orchestrator skill. It runs the workflow as a recipe. Work done outside the recipe runner WILL BE REJECTED — even if the code is correct.

The dev-orchestrator applies to: Rust code, Python test changes, CI configuration, build scripts, documentation updates tied to code changes. The only exception is editing this issue itself.

### Autonomous Loop

The implementing agent MUST work in a continuous, autonomous loop until CERTAIN that all targeted functionality is replaced and equivalent. The loop:

```
while not CERTAIN_ALL_FUNCTIONALITY_REPLACED:
    1. Pick the next untouched Python function/module from the target list
    2. Read the Python implementation thoroughly — understand WHAT it does and WHY
    3. Write a behavioral spec: inputs, outputs, edge cases, error conditions
    4. Invoke Skill(skill="dev-orchestrator") to implement the Rust equivalent via recipe runner
    5. Write tests that mirror every Python test + cover edge cases from step 3
    6. Run `amplihack hooks compare` against golden files for the component
    7. If mismatch → fix → go to step 6
    8. If match → mark as done, commit, move to next
    9. After completing a phase, run the FULL triple-check (see below)
```

**Only stop and ask when:**
- A design decision would be irreversible AND multiple valid approaches exist with materially different tradeoffs
- An external dependency is missing and cannot be installed
- A Python behavior appears to be a bug — ask whether to replicate the bug or fix it

**Never stop to ask about:**
- Code style choices (follow idiomatic Rust and the patterns in this issue)
- Test strategy (always test more, not less)
- Whether to proceed to the next item (always proceed)

### Triple-Check Verification Protocol

Before declaring ANY phase complete, run all three checks. All three must pass.

#### Check 1: Functional Equivalence Matrix

For every public function/entry point being replaced, build an automated comparison matrix:

```
┌─────────────────────┬──────────────┬──────────────┬─────────┐
│ Function            │ Python Output│ Rust Output  │ Match?  │
├─────────────────────┼──────────────┼──────────────┼─────────┤
│ pre_tool_use(bash)  │ {"allow":..} │ {"allow":..} │ ✅      │
│ pre_tool_use(edit)  │ {"deny":...} │ {"deny":...} │ ✅      │
│ stop(reflection)    │ {"block":..} │ {"block":..} │ ✅      │
│ ...every function   │              │              │         │
└─────────────────────┴──────────────┴──────────────┴─────────┘
```

Generate test fixtures from real session transcripts. Run BOTH implementations against every fixture. Diff outputs. The matrix must be 100% green before proceeding.

#### Check 2: Fuzz / Property Testing

For each module, write property-based tests (`proptest` in Rust, `hypothesis` in Python):
- Random valid JSON inputs → both must produce identical outputs
- Random invalid JSON inputs → both must fail gracefully (not crash)
- Oversized inputs → both must handle within timeout
- Minimum 10,000 random inputs per function

#### Check 3: Integration Smoke Test

Run the Rust binary in a REAL amplihack session:
1. Start `amplihack claude` with `AMPLIHACK_HOOK_ENGINE=rust`
2. Perform at least 10 real operations (edit files, run bash, write code)
3. Capture all hook stdin/stdout pairs
4. Replay the same stdin against Python hooks
5. Diff all outputs — must match

### Side-by-Side Evaluation Framework

Build these tools so the owner can compare Rust and Python behavior **offline, during development** — NOT as a runtime mode.

> ⚠️ **NO FALLBACKS. NO DUAL-ENGINE RUNTIME MODE.** Comparison happens in tests, in CI, and via explicit CLI commands. Never at runtime. If Rust is the active engine, it's the ONLY engine. If it produces wrong output, that's a bug to fix, not a reason to fall back to Python.

#### 1. `amplihack hooks compare` CLI Command (Development Tool)

```bash
# Compare a single hook event (offline — runs both implementations and diffs)
amplihack hooks compare --event PreToolUse --input fixture.json

# Compare all hooks against golden files
amplihack hooks compare --all

# Compare with timing (informational, not a correctness gate)
amplihack hooks compare --event PreToolUse --input fixture.json --timing
# Output:
#   Python: 145ms  {"permissionDecision": "allow"}
#   Rust:     3ms  {"permissionDecision": "allow"}
#   Status: MATCH ✅
```

This is a **test/development tool**, not a production mode. You run it explicitly, look at the output, fix discrepancies.

#### 2. Golden File Test Suite

```
tests/golden/
  hooks/
    pre_tool_use/
      bash_npm_test.input.json
      bash_npm_test.expected.json
      bash_rm_rf.input.json
      bash_rm_rf.expected.json
      ...
    stop/
      normal_stop.input.json
      normal_stop.expected.json
      reflection_needed.input.json
      reflection_needed.expected.json
```

Both Python and Rust must produce byte-identical output for every golden file. CI runs both and fails on any diff. Every bug found adds a new golden file as a regression test.

### Dogfooding Protocol

The implementing agent MUST use the Rust version for its own development work. Non-negotiable.

| Phase | Requirement |
|-------|------------|
| Phase 1 (Hooks) | After first 3 hooks compile: switch to Rust hooks for ALL subsequent development. Fix bugs as they surface. |
| Phase 2 (CLI) | After CLI compiles: use `amplihack-rs claude` as the launcher. No Python CLI alongside. |
| Phase 2 (CLI) | After CLI compiles: use `amplihack-rs claude` as the launcher. No Python CLI alongside. |

**Dogfooding log:** Maintain a running table of issues discovered through dogfooding. Every dogfooding issue becomes a golden file test case.

**If Rust hooks break during dogfooding:** This is expected during development. The response is:
1. Capture the exact input that caused the failure
2. Add it as a golden file test case immediately
3. Fix the Rust bug
4. Verify the fix passes the new golden file
5. Continue dogfooding

Do NOT switch back to Python hooks. Fix forward. If the bug is severe enough that you literally cannot continue working (e.g., every command is blocked), fix the bug first — that's the highest priority task.

### CI Gates

```yaml
jobs:
  functional-equivalence:
    steps:
      - name: Golden file tests (Python)
        run: python -m pytest tests/golden/ --engine=python
      - name: Golden file tests (Rust)
        run: python -m pytest tests/golden/ --engine=rust
      - name: Diff outputs
        run: python tests/golden/diff_outputs.py

  property-tests:
    steps:
      - name: Rust property tests
        run: cd amplihack-rs && cargo test --test proptest -- --test-threads=1
      - name: Python property tests
        run: python -m pytest tests/property/ -x

  dogfooding-smoke:
    steps:
      - name: Real hook invocations with Rust
        run: |
          export AMPLIHACK_HOOK_ENGINE=rust
          python tests/smoke/run_real_hooks.py --count 10
      - name: Compare with Python baseline
        run: python tests/smoke/compare_outputs.py
```

**CI blocks merge if:** any golden file test fails, property tests find divergence, Rust binary panics, or Rust output differs from Python on any fixture.

## Work Decomposition & Delegation

### Dependency Graph

Work must proceed in dependency order. The graph below shows what blocks what:

```
amplihack-common (shared types, JSON protocol, error types, file lock utils)
    │
    ├── amplihack-hooks/pre_tool_use ──┐
    ├── amplihack-hooks/post_tool_use ─┤── Tier 1: Independent, parallelizable
    ├── amplihack-hooks/session_stop ──┘
    │
    ├── amplihack-hooks/user_prompt_submit ─┐── Tier 2: Need common + memory bridge
    ├── amplihack-hooks/pre_compact ────────┘
    │
    ├── amplihack-hooks/stop ───────────┐── Tier 3: Coordinate via shared state files
    ├── amplihack-hooks/session_start ──┘
    │
    └── amplihack-cli (depends on hooks being done)

Note: amplihack-xpia is a SEPARATE REPO/ISSUE — not part of this dependency graph.
```

### What Must Be Sequential

| Step | Why Sequential | Blocks |
|------|---------------|--------|
| `amplihack-common` crate | All hooks import shared types (HookInput, HookOutput, JsonProtocol, ErrorProtocol, FileLock, ShutdownContext) | Everything |
| SDK bridge module (`python_bridge.rs`) | 4 hooks need subprocess-to-Python calls; define the pattern once | Tier 2 + Tier 3 hooks |
| Golden file infrastructure | Test harness must exist before hook implementation can be verified | All hook implementation |

### What Can Run in Parallel

**Tier 1 — 3 hooks, zero dependencies on each other:**

| Hook | LOC | Shared State | SDK Bridge? | Notes |
|------|-----|-------------|-------------|-------|
| `pre_tool_use` | 492 | Reads config files (read-only) | No | Largest hook. Pure input→decision. Start here — it exercises the most common types. |
| `post_tool_use` | 203 | Reads config, writes metrics file | No | Smallest non-trivial hook. Good second target. |
| `session_stop` | 86 | None | Yes (MemoryCoordinator.store) | Smallest hook. SDK bridge is the only complexity. |

These 3 hooks share NO state and have NO cross-dependencies. After `amplihack-common` is built, they can be implemented simultaneously by parallel sub-agents.

**Tier 2 — 2 hooks, depend on Tier 1 patterns but not Tier 1 code:**

| Hook | LOC | Shared State | SDK Bridge? | Notes |
|------|-----|-------------|-------------|-------|
| `user_prompt_submit` | 451 | Reads preference files | Yes (inject_memory_for_agents_sync) | Memory injection. Can reuse bridge pattern from session_stop. |
| `pre_compact` | 195 | Reads transcript files | No | Context preservation. Independent of user_prompt_submit. |

These depend on the SDK bridge pattern being established (from Tier 1's session_stop) but not on Tier 1 hooks being complete. They CAN run in parallel with each other.

**Tier 3 — 2 hooks, MUST coordinate:**

| Hook | LOC | Shared State | SDK Bridge? | Notes |
|------|-----|-------------|-------------|-------|
| `stop` | 902 | Writes lock files, counter files, continuation prompt files | Yes (reflection) | Largest, most complex hook. Lock mode + power steering state machine. |
| `session_start` | 601 | Reads lock files written by stop, writes session state | Yes (memory context) | Coordinates with stop via shared file locks. |

These two share state through the filesystem (lock files, counter files). They MUST be designed together — the file format and locking protocol must be agreed before either is implemented. Implement stop first (it defines the state), then session_start (it reads that state).

### Sub-Agent Delegation Guide

The implementing agent should use sub-agents strategically:

| Task Type | Agent Type | When to Use |
|-----------|-----------|-------------|
| **Understanding Python code** | `explore` | Before implementing each function — understand what the Python does, what edge cases it handles, what state it reads/writes |
| **Implementing a Rust function** | Do it yourself | The main agent should write Rust code directly — it has the full context of the issue spec, patterns, and prior decisions |
| **Writing golden file test inputs** | `task` | Generating synthetic JSON inputs for a hook — mechanical work that benefits from parallelism |
| **Running tests / building** | `task` | `cargo test`, `cargo clippy`, golden file comparison — fire and forget, read results |
| **Reviewing a completed hook** | `code-review` | After implementing a hook, have a code-review agent check for correctness issues, missing edge cases, silent error swallowing |
| **Investigating a Python bug** | `explore` | When a golden file reveals Python behavior that looks wrong — investigate before deciding whether to replicate or fix |

**Parallel sub-agent opportunities:**
- Launch 3 `explore` agents in parallel to analyze Tier 1 hooks before implementing
- Launch `task` agents to run `cargo test` while continuing to write code
- Launch `code-review` on hook N while implementing hook N+1

**Never delegate:**
- Architectural decisions (crate structure, type design, error strategy) — main agent only
- SDK bridge contract design — must be consistent across all 4 uses
- State file format decisions for stop/session_start coordination

### Recommended Execution Order

```
Phase 0:
  0a. Type analysis (explore Python hooks) ─────────────────────┐
  0b. Build amplihack-common crate ──────────────────────────────┤ Sequential
  0c. Build golden file infrastructure ──────────────────────────┘
  0d. Capture golden files for ALL hooks ─── (can parallel: 1 task agent per hook)

Phase 1 — Tier 1 (parallel):
  ┌── pre_tool_use implementation + tests ──┐
  ├── post_tool_use implementation + tests ─┤ Parallel (independent)
  └── session_stop implementation + tests ──┘
  Then: Triple-check Tier 1

Phase 1 — Tier 2 (parallel):
  ┌── user_prompt_submit implementation ────┐
  └── pre_compact implementation ───────────┘ Parallel (independent)
  Then: Triple-check Tier 2

Phase 1 — Tier 3 (sequential):
  stop implementation ──→ session_start implementation
  (stop defines shared state format; session_start reads it)
  Then: Triple-check Tier 3
  Then: Triple-check ALL hooks together

Phase 2 (after Phase 1 — CONDITIONAL):
  amplihack-cli ── Depends on hooks being done
```

### Checkpoint Strategy

After completing each tier, create a checkpoint:
1. All golden files pass for the tier
2. `cargo clippy -D warnings` clean
3. `amplihack hooks compare` shows 0 mismatches for the tier
4. Code-review agent finds no correctness issues
5. Commit, push, create PR

Do NOT move to the next tier until the current tier's PR is clean. Accumulating untested code across tiers creates integration nightmares.

## Branch & PR Strategy

**Rule:** Never push to main. Always feature branches + PRs.

| Scope | Branch Pattern | PR Size Target |
|-------|---------------|----------------|
| Phase 0 scaffolding | `amplihack-rs/phase-0-scaffold` | 1 PR |
| Phase 0b golden files | `amplihack-rs/golden-files` | 1 PR |
| Each hook | `amplihack-rs/hook-{name}` (e.g., `hook-pre-tool-use`) | 1 PR per hook |
| XPIA crate | Tracked separately in `amplihack-xpia-defender` repo | Separate issue |
| CLI (if Phase 2) | `amplihack-rs/cli-{component}` | 2-3 PRs |
| Infrastructure (CI, build) | `amplihack-rs/ci-{what}` | Small PRs |

**PR requirements:**
- Golden file parity tests pass
- `cargo clippy -D warnings` clean
- `cargo fmt --check` clean
- Property tests pass (if applicable to the component)
- `amplihack hooks compare` results included in PR description
- PR description lists specific correctness improvements (e.g., "eliminates TOCTOU race in counter update")
- dev-orchestrator invoked for all changes (evidence: structured commit messages, test coverage)

### Commit Cadence

- **Commit after every completed function.** Don't accumulate large uncommitted changesets.
- **Push to branch after every 3-5 commits.** Ensures work is saved remotely.
- **Create PR after completing a logical unit** (one hook, one crate, one component). Don't batch too many components into one PR.
- **If stuck for >2 hours on one function:** commit what works, write a `TODO(blocked)` comment with the issue, move to the next function. Come back after finishing the rest of the component — fresh eyes often solve the problem.

### Recovery Strategy

- **If a component is harder than expected:** Don't grind indefinitely. After implementing and testing the straightforward functions, if the remaining ones are blocked, document what's blocking them and move to the next component. Return with more context later.
- **If golden file tests reveal Python bugs:** Fix the Python bug first (in a separate PR), re-capture golden files, then continue with Rust. Don't try to replicate known-wrong behavior.
- **If a Rust pattern from this issue doesn't work as described:** Document why, implement the correct pattern, and update this issue with what you learned. The patterns here were reviewed but not all were compiled — real-world adjustments are expected.
- **If CI is red and you can't fix it quickly:** Revert your last change, ensure CI is green, then try again with a different approach. Never leave CI red for more than 1 commit.

## Parity Maintenance Policy

During the transition period (Phases 1-4), both Python and Rust implementations exist.

**Rule: Python is the source of truth until a hook is fully validated in Rust.**

| Scenario | Action |
|----------|--------|
| Bug found in Python hook | Fix in Python. Add golden file test. Port fix to Rust. Verify parity. |
| Bug found in Rust hook | Fix in Rust. Add golden file test. Check if Python has same bug — fix if so. |
| New feature added to Python hook | Implement in Python first. Add golden file test. Port to Rust before next release. |
| New Claude Code hook event added | Add `#[serde(other)] Unknown` handles it. Then implement in Python, then Rust. |
| Python hook diverges from Rust (CI catches) | **Block merge.** Fix whichever side is wrong. |

**CI enforcement:** The parity test harness runs on every PR. If Python and Rust produce different output for any golden file, the PR cannot merge.

## Acceptance Criteria (Definition of "CERTAIN")

A phase is CERTAIN complete when ALL of the following are true:

### Per-Function Criteria
- [ ] Rust function exists with equivalent signature
- [ ] Unit tests cover all paths the Python tests cover
- [ ] Golden file tests pass (100% — no exceptions)
- [ ] Property tests pass (10,000+ random inputs, zero divergence)
- [ ] Edge cases: empty input, malformed JSON, oversized input, unicode, null bytes

### Per-Hook Criteria (on top of per-function)
- [ ] Multicall binary dispatches correctly to this hook
- [ ] Exit code contract matches (0, 2, other)
- [ ] Stderr output matches Python format
- [ ] All type-safety bugs from Python analysis proven eliminated by Rust's type system (exhaustive enums, no stringly-typed dispatch, no TOCTOU)
- [ ] Fail-open behavior: panic → `b"{}"` on stdout, exit 0

### Per-Phase Criteria (on top of per-hook)
- [ ] All hooks in the phase pass all per-hook criteria
- [ ] Integration smoke test: 10+ real operations with `AMPLIHACK_HOOK_ENGINE=rust`
- [ ] Dual-engine mode (`=dual`) run for ≥1 full session with 0 mismatches
- [ ] Dogfooding: agent used Rust for ≥3 sessions of real development work
- [ ] No regressions in existing Python test suite
- [ ] PR reviewed and merged
- [ ] CI pipeline green on all 4 platforms

### Project-Complete Criteria (all phases)
- [ ] Every function in the inventory table above has status "done"
- [ ] Every golden file passes on both engines
- [ ] Dual-engine mode shows 0 mismatches across 5 consecutive sessions
- [ ] Performance baselines table filled with actual measurements showing improvement
- [ ] `amplihack hooks compare --transcript` shows 100% match on 3 real transcripts
- [ ] Distribution: platform wheel builds succeed for all 4 targets
- [ ] Distribution: UVX install flow tested end-to-end
- [ ] README/docs updated with Rust binary information

## Risks

| Risk | Prob | Impact | Mitigation |
|------|------|--------|------------|
| Claude Code hook protocol changes | Med | High | `#[serde(other)] Unknown` + `#[serde(default)]` everywhere. Fuzz with extra fields. |
| shell-words inadequate for edge cases | Med | High | Property-test against /bin/sh. Custom post-processor for `&&`/`||`. Accept: can't parse all bash. |
| Two build systems (Cargo + setuptools) | High | Med | Design CI FIRST (Phase 0d). Use cibuildwheel for wheels. |
| Contributor friction (Rust toolchain) | High | Med | CONTRIBUTING_RUST.md. Rust basics gate in Phase 0. |
| macOS code signing / Gatekeeper | Med | Med | cosign + notarization in CI from Phase 2. |
| Rust hook crashes in production | Med | High | Catch_unwind returns `b"{}"` (fail-open for non-security hooks). Fix the bug immediately. No silent fallback to Python. |
| Python hooks diverge during transition | Med | Med | Parity test harness in CI blocks merge on diff. |

## Timeline Summary

| Phase | Duration | Cumulative | Deliverable |
|-------|----------|------------|-------------|
| 0: Foundation | 3 weeks | 3 weeks | Golden files, workspace, Python experiment |
| 1: All Hooks | 5-6 weeks | 8-9 weeks | All hooks in Rust (3 tiers) |
| 2: CLI launcher | 3 weeks | 11-12 weeks | Rust CLI binary (conditional) |

> **XPIA** is tracked in a separate issue/repo. Not included in this timeline.

Each phase ships independently. Decision gates after Phase 0a (CLI go/no-go) and Phase 1 (continue or stop). The project delivers value even if only Phase 1 ships.

## Open Questions

1. **Phase 2 go/no-go**: Run the Python experiment first. Does `mypy --strict` catch the correctness issues we care about? If so, the CLI phase may not be worth it.
2. **Hook protocol documentation**: Should we ask Anthropic for a formal hook JSON schema, or accept the risk of undocumented changes?
3. **Nesting detection strategy**: Env var marker (simple) vs session file with PID (existing, fragile)?

---

## Execution Recipes

These recipes are the executable specification for the entire project. Each recipe defines the exact steps, inputs, outputs, and success criteria. The implementing agent should load and execute these recipes — they ARE the plan, not a summary of it.

### Per-Component Recipe: `rust-rewrite-component.yaml`

This is the inner loop. It runs once for each Python function/module being rewritten. Every function in the inventory table above gets processed through this recipe. The recipe does not complete until the Rust code is provably correct — not just compiling, but producing identical behavior to the Python on every input.

```yaml
name: "rust-rewrite-component"
description: "Rewrite one Python component to Rust with verified behavioral equivalence"
version: "1.0.0"
tags: ["amplihack-rs", "rewrite", "correctness"]

context:
  python_file: ""           # e.g. ".claude/tools/amplihack/hooks/pre_tool_use.py"
  python_function: ""       # e.g. "PreToolUseHook._check_cwd_deletion"
  rust_target_file: ""      # e.g. "amplihack-rs/crates/amplihack-hooks/src/pre_tool_use/cwd.rs"
  rust_crate: ""            # e.g. "amplihack-hooks"
  golden_file_dir: ""       # e.g. "tests/golden/hooks/pre_tool_use/cwd_deletion"
  has_sdk_calls: "false"    # "true" if function calls LLM/memory APIs
  analysis: ""
  golden_status: ""
  implementation: ""
  tests_written: ""
  build_result: ""
  parity_result: ""
  property_result: ""
  verification: ""

steps:
  # ── STEP 1: Deep analysis of the Python implementation ────────────
  # Goal: Understand EVERY code path, not just the happy path.
  # The analysis becomes the correctness specification for the Rust code.
  - id: "analyze-python"
    type: agent
    agent: "amplihack:builder"
    prompt: |
      Read `{{python_file}}`, focusing on `{{python_function}}`.

      Produce a COMPLETE behavioral specification as JSON:

      {
        "function_signature": "name(params) -> return_type",
        "lines": "start-end",
        "loc": <number>,

        "code_paths": [
          {
            "name": "happy path - safe bash command",
            "trigger": "command does not match any dangerous pattern",
            "returns": "{} (empty dict = allow)",
            "side_effects": []
          },
          {
            "name": "block - CWD deletion detected",
            "trigger": "command contains 'rm -rf' targeting CWD",
            "returns": "{'hookSpecificOutput': {'permissionDecision': 'deny', ...}}",
            "side_effects": ["log to hook log file", "save metric"]
          }
          // ... EVERY path, including error/exception paths
        ],

        "input_schema": {
          "required_fields": ["tool_name", "tool_input"],
          "tool_input_fields": {"command": "string"},
          "optional_fields": ["session_id", "cwd"]
        },

        "output_schema": {
          "on_allow": {},
          "on_deny": {"hookSpecificOutput": {"permissionDecision": "deny", "permissionDecisionReason": "..."}},
          "on_error": {}
        },

        "constants_and_regexes": [
          {"name": "_RM_RECURSIVE_RE", "value": "...", "purpose": "detect rm -rf/rm -r"}
        ],

        "side_effects": [
          "reads CWD via os.getcwd()",
          "resolves symlinks via Path.resolve()",
          "writes to log file via self.log()"
        ],

        "sdk_calls": [],  // or ["inject_memory_for_agents_sync()"] if boundary function

        "edge_cases": [
          "command with quoted paths containing spaces",
          "glob patterns that expand to CWD",
          "symlinks pointing to CWD",
          "empty command string",
          "command with unicode characters",
          "very long command (>1MB)"
        ],

        "known_bugs_in_python": [
          // List any bugs found during analysis — these should be FIXED in Rust
          // e.g. "shlex.split() crashes on unmatched quotes — Python catches with bare except"
        ]
      }

      Be exhaustive. Missing a code path here means the Rust version will have a gap.
    output: "analysis"
    parse_json: true

  # ── STEP 2: Ensure golden files exist ─────────────────────────────
  # Golden files are the behavioral oracle. Without them, we can't verify correctness.
  - id: "check-golden-files"
    type: bash
    command: |
      if [ -d "{{golden_file_dir}}" ]; then
        count=$(ls {{golden_file_dir}}/*.input.json 2>/dev/null | wc -l)
        echo "GOLDEN_FILES_EXIST: $count files"
      else
        echo "GOLDEN_FILES_MISSING"
      fi
    output: "golden_status"

  # ── STEP 3: Generate golden files if missing ──────────────────────
  # Each golden file = one concrete input/output pair from the Python implementation.
  # These capture actual behavior including quirks and edge cases.
  - id: "capture-golden-files"
    type: agent
    condition: "'MISSING' in golden_status or 'EXIST: 0' in golden_status"
    agent: "amplihack:builder"
    prompt: |
      Generate golden file test cases for `{{python_function}}`.

      Based on the analysis:
      {{analysis}}

      For EVERY code path in the analysis, create at least one golden file.
      For security-critical paths, create multiple (normal case + edge cases).

      Each golden file is a pair:
        {name}.input.json   — the exact JSON that would arrive on stdin
        {name}.expected.json — the exact JSON the Python produces on stdout

      Generate the input, then ACTUALLY RUN the Python implementation to capture
      the output. Do NOT guess what the output should be — run it and record.

      ```bash
      echo '{"tool_name":"Bash","tool_input":{"command":"rm -rf /"}}' | python {{python_file}}
      ```

      Required golden files (minimum):
      1. One per code path from the analysis (happy + every error/block path)
      2. Empty input: `{}`
      3. Malformed input: `{"not_a_real_field": true}`
      4. Missing required fields: `{"tool_name": "Bash"}` (no tool_input)
      5. Oversized input: command string >10KB
      6. Unicode in command: `{"tool_input": {"command": "echo '日本語'"}}`
      7. Null bytes: `{"tool_input": {"command": "echo '\x00'"}}`
      8. Nested quotes: `{"tool_input": {"command": "bash -c \"rm -rf '$(pwd)'\""}}` 

      Save all files to {{golden_file_dir}}/

      If you discover a Python bug while capturing (crash, incorrect output),
      document it in a file called {{golden_file_dir}}/BUGS.md and create the
      golden file with the CORRECTED expected output (what Rust should do).
    output: "golden_capture_result"

  # ── STEP 4: Write the Rust implementation ─────────────────────────
  # This is where correctness matters most. Every decision should be
  # justified by the analysis, not by intuition.
  - id: "write-rust"
    type: agent
    agent: "amplihack:builder"
    prompt: |
      Write the Rust implementation of `{{python_function}}` in `{{rust_target_file}}`
      within the `{{rust_crate}}` crate.

      Behavioral specification:
      {{analysis}}

      CORRECTNESS REQUIREMENTS (these are non-negotiable):
      1. Handle EVERY code path from the analysis. No path may be skipped.
      2. Match Python output exactly for every golden file in {{golden_file_dir}}/
      3. Use Rust's type system to make illegal states unrepresentable:
         - Exhaustive enums instead of stringly-typed dispatch
         - newtype wrappers for paths, session IDs, etc. where confusion is possible
         - Option<T> instead of sentinel values
      4. Eliminate the Python correctness gaps the analysis identified:
         - TOCTOU races → use atomic file ops (O_CREAT|O_EXCL, temp+rename)
         - Bare except → typed error handling with thiserror/anyhow
         - String-based dispatch → exhaustive match on enums
         - Missing validation → parse, don't validate (serde + newtype)
      5. Use #[serde(other)] Unknown for forward-compatible deserialization
      6. No unwrap() except in tests. No panic!() in library code.
      7. If the function makes SDK/LLM calls (has_sdk_calls={{has_sdk_calls}}),
         implement as a subprocess call to an embedded Python bridge script.

      STYLE:
      - Follow idiomatic Rust (clippy -D warnings must pass)
      - Doc comments on public items explaining WHAT and WHY, not HOW
      - Inline comments only for non-obvious correctness reasoning
      - Prefer small functions with clear names over large ones with comments

      Write the complete implementation. No TODOs. No placeholders.
    output: "implementation"

  # ── STEP 5: Write comprehensive tests ─────────────────────────────
  # Tests are the proof that the Rust code is correct.
  # They should make it impossible to regress.
  - id: "write-tests"
    type: agent
    agent: "amplihack:builder"
    prompt: |
      Write tests for `{{rust_target_file}}` in the `{{rust_crate}}` crate.

      Behavioral specification: {{analysis}}
      Implementation: {{implementation}}

      THREE categories of tests are required:

      **Category 1: Golden file tests (MANDATORY)**
      Load every .input.json from {{golden_file_dir}}/
      Run it through the Rust implementation.
      Compare output to .expected.json using semantic JSON comparison
      (ignore key ordering and whitespace, but match values exactly).

      ```rust
      #[test]
      fn golden_files() {
          for entry in fs::read_dir("{{golden_file_dir}}").unwrap() {
              let path = entry.unwrap().path();
              if path.extension() == Some("input.json".as_ref()) {
                  let input = fs::read_to_string(&path).unwrap();
                  let expected_path = path.with_extension("expected.json");
                  let expected = fs::read_to_string(&expected_path).unwrap();
                  let actual = run_hook_with_input(&input);
                  assert_json_eq!(actual, expected, "Golden file mismatch: {}", path.display());
              }
          }
      }
      ```

      **Category 2: Unit tests for each code path**
      One test per code path from the analysis. Test name should describe
      the scenario, not the implementation detail.

      **Category 3: Property tests (proptest)**
      - Random valid JSON inputs → must not panic, must return valid JSON
      - Random invalid JSON → must not panic, must return {} or valid error
      - Random strings in command field → must not panic
      - Minimum: #[proptest(cases = 10000)]

      Also test:
      - Panic recovery: force a panic and verify catch_unwind returns b"{}"
      - SIGPIPE: verify graceful handling when stdout is closed
      - Concurrent access: if function touches files, run 10 threads simultaneously
    output: "tests_written"

  # ── STEP 6: Build and run all tests ───────────────────────────────
  - id: "build-and-test"
    type: bash
    command: |
      cd amplihack-rs && \
      cargo clippy -D warnings 2>&1 && \
      cargo fmt --check 2>&1 && \
      cargo test -p {{rust_crate}} -- --test-threads=1 2>&1
    output: "build_result"

  # ── STEP 7: Fix any build/test failures ───────────────────────────
  # Do not move on with broken code. Fix everything.
  - id: "fix-build-errors"
    type: agent
    condition: "'error' in build_result or 'FAILED' in build_result"
    agent: "amplihack:builder"
    prompt: |
      Build or tests failed. Fix ALL errors:

      {{build_result}}

      After fixing, re-run the build and tests. Do not proceed until clean.
      If a test failure reveals a real behavior difference from Python,
      fix the Rust implementation — do NOT weaken the test.
    output: "fix_result"

  # ── STEP 8: Run golden file parity check ──────────────────────────
  # This is the critical correctness gate. Python and Rust must produce
  # identical output for every golden file.
  - id: "run-golden-parity"
    type: bash
    command: |
      cd amplihack-rs && \
      cargo build --release -p {{rust_crate}} 2>&1 && \
      python tests/golden/run_parity.py \
        --golden-dir {{golden_file_dir}} \
        --rust-binary target/release/amplihack-hooks \
        --python-script {{python_file}} 2>&1
    output: "parity_result"

  # ── STEP 9: Fix parity mismatches ─────────────────────────────────
  # The golden files are the source of truth (unless documented as Python bugs).
  - id: "fix-parity-failures"
    type: agent
    condition: "'MISMATCH' in parity_result or 'FAIL' in parity_result"
    agent: "amplihack:builder"
    prompt: |
      Golden file parity check found mismatches:

      {{parity_result}}

      For each mismatch:
      1. Check if this is a documented Python bug (see {{golden_file_dir}}/BUGS.md)
         - If yes: verify Rust output matches the CORRECTED expected output
         - If no: the Rust implementation is wrong. Fix it to match Python.
      2. Do NOT change golden files to match Rust. Fix the Rust code.
      3. Re-run parity check after each fix.
      4. Do not proceed until 100% parity.
    output: "parity_fix_result"

  # ── STEP 10: Run property tests ───────────────────────────────────
  - id: "run-property-tests"
    type: bash
    command: |
      cd amplihack-rs && \
      cargo test -p {{rust_crate}} --test proptest -- --test-threads=1 2>&1
    output: "property_result"

  # ── STEP 11: Correctness verification ─────────────────────────────
  # A reviewer agent checks the work. This is the third pair of eyes.
  - id: "verify-correctness"
    type: agent
    agent: "amplihack:reviewer"
    prompt: |
      Verify this component rewrite is CORRECT and COMPLETE.

      Python function: {{python_function}} in {{python_file}}
      Rust target: {{rust_target_file}} in {{rust_crate}}
      Behavioral specification: {{analysis}}
      Build result: {{build_result}}
      Parity result: {{parity_result}}
      Property test result: {{property_result}}

      Verify:
      1. Every code path in the behavioral spec has a corresponding Rust code path
      2. Every code path has at least one test
      3. All golden file parity tests pass
      4. All property tests pass
      5. No TODO/FIXME/HACK comments remain
      6. No unwrap() outside of tests
      7. Error handling covers all cases (no bare catch-all)
      8. Type safety: are there any places where stringly-typed values
         could be replaced with enums or newtypes?
      9. Concurrency safety: are file operations atomic where they need to be?
      10. Forward compatibility: does deserialization handle unknown fields/variants?

      Output JSON:
      {"status": "CORRECT"} — if ALL checks pass
      {"status": "INCORRECT", "issues": ["specific issue 1", "specific issue 2"]}
    output: "verification"
    parse_json: true
```

### Phase Recipe: `rust-rewrite-phase.yaml`

This recipe orchestrates a complete phase. It processes all components in the phase through the per-component recipe, then runs the triple-check verification protocol. A phase does not complete until triple-check passes.

```yaml
name: "rust-rewrite-phase"
description: "Execute a full phase: rewrite all components, verify correctness, enable dogfooding"
version: "1.0.0"
tags: ["amplihack-rs", "phase", "correctness"]

context:
  phase_name: ""          # e.g. "pre-tool-use"
  phase_number: ""        # e.g. "1"
  components: ""          # Multi-line: python_file → rust_target mapping
  baselines: ""
  rewrite_result: ""
  check1_result: ""
  check2_result: ""
  check3_result: ""
  dogfooding_status: ""
  pr_result: ""

steps:
  # ── STEP 1: Capture behavioral baselines ──────────────────────────
  # Record what the Python does so we can prove Rust does the same thing.
  - id: "capture-behavioral-baselines"
    type: agent
    agent: "amplihack:builder"
    prompt: |
      Capture behavioral baselines for all components in phase {{phase_name}}.

      Components: {{components}}

      For each component:
      1. Run the Python implementation with 50+ representative inputs
      2. Record FULL input/output pairs (not just timing)
      3. Record exit codes and stderr output
      4. Save to tests/baselines/phase-{{phase_number}}/

      These baselines serve as the correctness oracle.
      Also record timing as informational data (not a success criterion).
    output: "baselines"

  # ── STEP 2: Rewrite all components ────────────────────────────────
  # Execute the per-component recipe for each item in the component list.
  # Do not stop between components. Work autonomously through the list.
  - id: "rewrite-all-components"
    type: agent
    agent: "amplihack:builder"
    prompt: |
      For EVERY component listed below, execute the full rewrite workflow:
      analyze → golden files → implement → test → parity check → property test → verify.

      Components to rewrite:
      {{components}}

      Work through them sequentially. After each component's golden file parity
      test passes at 100%, move to the next. Do not stop to ask between components.

      Track progress:
      - Log each component's status (DONE / IN_PROGRESS / BLOCKED)
      - If a component is BLOCKED, note why and move to the next unblocked one
      - Return to BLOCKED components after finishing the unblocked ones

      Do not declare this step complete until EVERY component is DONE.
    output: "rewrite_result"

  # ── TRIPLE CHECK 1: Functional Equivalence Matrix ─────────────────
  # Build the matrix showing Python vs Rust output for every golden file.
  - id: "triple-check-1-equivalence"
    type: agent
    agent: "amplihack:reviewer"
    prompt: |
      Build and execute the functional equivalence matrix for phase {{phase_name}}.

      For EVERY function that was rewritten in this phase:
      1. Collect all golden files for that function
      2. Run each golden file input through the Python implementation
      3. Run each golden file input through the Rust implementation
      4. Compare outputs using semantic JSON comparison
      5. Record: function name, input file, Python output, Rust output, MATCH/MISMATCH

      Build the matrix as a table:
      | Function | Golden File | Python Output Hash | Rust Output Hash | Match? |
      |----------|-------------|-------------------|-----------------|--------|
      | ... | ... | ... | ... | ✅ or ❌ |

      The matrix MUST be 100% ✅. Report any ❌ with full diff.
      If any mismatch: this check FAILS. Do not proceed.
    output: "check1_result"

  # ── TRIPLE CHECK 2: Fuzz / Property Testing ───────────────────────
  # Random inputs catch edge cases that golden files miss.
  - id: "triple-check-2-property-tests"
    type: bash
    command: |
      cd amplihack-rs && \
      cargo test --test proptest -- --test-threads=1 2>&1 && \
      echo "--- Rust property tests passed ---" && \
      cd .. && \
      python -m pytest tests/property/ -x -v 2>&1 && \
      echo "--- Python property tests passed ---"
    output: "check2_result"

  # ── TRIPLE CHECK 3: Live Integration Smoke Test ───────────────────
  # Use the Rust hooks in a real session. This catches issues that
  # unit tests and golden files miss: timing, pipe handling, encoding, etc.
  - id: "triple-check-3-live-integration"
    type: agent
    agent: "amplihack:builder"
    prompt: |
      Run the Rust hooks in a REAL amplihack session.

      1. Set AMPLIHACK_HOOK_ENGINE=rust
      2. Start an amplihack session
      3. Perform at least 10 DIVERSE real operations:
         - Edit a file (triggers PostToolUse)
         - Run a bash command (triggers PreToolUse)
         - Run a dangerous-looking command that should be blocked (triggers PreToolUse deny)
         - Write a new file (triggers PostToolUse)
         - Do a grep/glob (triggers PreToolUse)
         - Let the session end naturally (triggers Stop)
      4. Capture ALL hook stdin/stdout pairs from the session
      5. Replay the captured stdin against the PYTHON hooks
      6. Diff every output pair

      Report: total hooks fired, matches, mismatches.
      If ANY mismatch: this check FAILS. Report the full diff.
      Zero mismatches required to proceed.
    output: "check3_result"

  # ── STEP 6: Enable dogfooding ─────────────────────────────────────
  # From this point on, the agent uses Rust hooks for all its own work.
  - id: "enable-dogfooding"
    type: agent
    condition: "'100%' in check1_result or 'PASS' in check1_result"
    agent: "amplihack:builder"
    prompt: |
      All triple-checks passed for phase {{phase_name}}.

      1. Set AMPLIHACK_HOOK_ENGINE=rust in your environment
      2. For ALL remaining work on this project, you are now running Rust hooks
      3. If you encounter any issue, log it with:
         - Exact input that caused the problem
         - Expected output (what Python does)
         - Actual Rust output
         - Stack trace or error message
      4. Fix the issue immediately, add a golden file test, and continue

      Do NOT switch back to Python unless you are completely blocked.
      If you must switch back, fix the Rust issue within one session and switch back.
    output: "dogfooding_status"

  # ── STEP 7: Create PR ────────────────────────────────────────────
  - id: "create-pr"
    type: agent
    agent: "amplihack:builder"
    prompt: |
      Create a PR for phase {{phase_name}} (phase {{phase_number}}).

      Branch: amplihack-rs/phase-{{phase_number}}-{{phase_name}}

      PR description MUST include:
      1. List of all Python functions replaced and their Rust equivalents
      2. Equivalence matrix results (100% match)
      3. Property test results (10K+ inputs, zero panics)
      4. Live integration test results (zero mismatches)
      5. Correctness improvements over Python:
         - Which type-safety bugs are now eliminated?
         - Which TOCTOU races are now impossible?
         - Which error-handling gaps are now covered?
      6. Any Python bugs discovered and fixed during the rewrite
      7. Dogfooding session count and issues found/fixed
    output: "pr_result"
```

### Master Recipe: `amplihack-rs-master.yaml`

This is the top-level execution specification. It orchestrates the entire project from start to finish. The implementing agent loads this recipe and executes it — this IS the autonomous loop.

```yaml
name: "amplihack-rs-master"
description: "Complete Rust rewrite: autonomous execution loop until CERTAIN all functionality is correct"
version: "1.0.0"
tags: ["amplihack-rs", "master", "correctness"]

context:
  phase_0_result: ""
  phase_2_gate: ""
  phase_1_result: ""
  phase_2_result: ""
  final_verification: ""

steps:
  # ═══════════════════════════════════════════════════════════════════
  # PHASE 0: FOUNDATION — build the infrastructure for verified rewrites
  # ═══════════════════════════════════════════════════════════════════

  # ── 0a: Python experiment (decision gate for Phase 2) ─────────────
  - id: "phase-0a-python-experiment"
    type: agent
    agent: "amplihack:builder"
    prompt: |
      Run the Python typing experiment (Phase 0a).

      This determines whether the CLI rewrite (Phase 2) is worth doing.

      1. Run `mypy --strict` on:
         - .claude/tools/amplihack/hooks/*.py
         - src/amplihack/cli.py
         - src/amplihack/settings.py
      2. Categorize every error:
         - TYPE_BUG: mypy found a real bug (e.g., wrong type passed, missing null check)
         - ANNOTATION_GAP: code is correct but lacks type annotations
         - FALSE_POSITIVE: mypy is wrong
      3. Count TYPE_BUGs. These are the correctness issues Rust would eliminate.
      4. Add lazy imports to cli.py entry point (measure baseline first)

      Decision:
      - If TYPE_BUGs in CLI ≥ 5: output "PHASE2_GO: {count} real type bugs in CLI"
      - If TYPE_BUGs in CLI < 5: output "PHASE2_SKIP: only {count} type bugs, mypy is sufficient"

      Also output the full categorized error list for reference.
    output: "phase_2_gate"

  # ── 0b: Golden file capture ──────────────────────────────────────
  - id: "phase-0b-golden-files"
    type: agent
    agent: "amplihack:builder"
    prompt: |
      Capture behavioral golden files for ALL hooks (Phase 0b).

      For each hook, instrument it to log every stdin/stdout pair, then
      run real amplihack sessions to generate traffic. You may also
      craft synthetic inputs to cover edge cases.

      MINIMUM golden file counts:
      - pre_tool_use: 200 (most critical — security decisions)
      - stop: 100 (lock mode, power steering, reflection)
      - session_start: 50 (version check, migration)
      - post_tool_use: 100 (tool metrics, validation)
      - user_prompt_submit: 100 (preference injection, memory)
      - pre_compact: 50 (transcript export)
      - session_stop: 30 (memory capture)

      For each golden file pair:
        {name}.input.json    — exact stdin JSON
        {name}.expected.json — exact stdout JSON
        {name}.meta.json     — exit code, stderr, timing (informational)

      ALSO: review each golden file for Python bugs.
      If the Python output looks wrong:
      1. Document the bug in tests/golden/PYTHON_BUGS.md
      2. Create the golden file with the CORRECTED expected output
      3. Note: "Rust should fix this. Python bug: [description]"

      Build the semantic JSON comparison tool:
        tests/golden/json_compare.py
      Rules: ignore key ordering, ignore whitespace, match values exactly,
      special handling for floating point (epsilon comparison).

      Save everything to tests/golden/hooks/{hook_name}/
    output: "golden_files_result"

  # ── 0c: Python test coverage uplift ──────────────────────────────
  - id: "phase-0c-test-coverage"
    type: agent
    agent: "amplihack:builder"
    prompt: |
      Uplift Python hook test coverage (Phase 0c).

      Current coverage is ~12%. Target: 60% overall, 80% on security-critical paths.

      Priority order:
      1. pre_tool_use: CWD deletion, CWD rename, main branch protection
      2. stop: lock mode safety valve, power steering counter
      3. hook_processor: read_input, write_output, path validation
      4. Error handling: every `except Exception` block needs a test
         that triggers that exception

      Run: pytest --cov=.claude/tools/amplihack/hooks/ --cov-report=term-missing

      The coverage uplift serves two purposes:
      - Finds Python bugs BEFORE we start writing Rust (fix them now)
      - Creates Python tests that we mirror in Rust (cross-reference)
    output: "coverage_result"

  # ── 0d: Workspace scaffolding ────────────────────────────────────
  - id: "phase-0d-scaffold"
    type: agent
    agent: "amplihack:builder"
    prompt: |
      Create the Cargo workspace and verification infrastructure (Phase 0d).

      1. Create amplihack-rs/ with 5-crate workspace:
         amplihack-types, amplihack-state, amplihack-hooks,
         amplihack-cli, bins/

      2. Cargo.toml workspace config:
         - resolver = "2"
         - [profile.release] lto = true, strip = true, panic = "unwind"
         - Common dependencies in workspace [dependencies]

      3. CI pipeline (.github/workflows/rust-ci.yml):
         - cargo clippy -D warnings
         - cargo fmt --check
         - cargo test
         - Cross-compilation for 4 platforms
         - Golden file parity tests (run both Python and Rust, diff outputs)

      4. Golden file parity test harness:
         - tests/golden/run_parity.py — runs both engines, compares
         - tests/golden/json_compare.py — semantic JSON comparison
         - Fails CI on ANY mismatch

      5. IPC version protocol:
         Every JSON message includes {"version": 1, ...}
         Rust deserializer accepts version 1, warns on unknown versions

      6. CONTRIBUTING_RUST.md:
         - Error handling: thiserror for libs, anyhow for bins
         - Deserialization: #[serde(other)] Unknown for forward compat
         - File ops: atomic (temp+rename), locks with timeout
         - No unwrap() except tests, no panic!() in libs
         - How to run golden file parity tests locally

      Create PR: amplihack-rs/phase-0-scaffold
    output: "phase_0_result"

  # ═══════════════════════════════════════════════════════════════════
  # PHASE 1: PRE_TOOL_USE HOOK — highest correctness value
  # This is the security-critical hook. Type safety matters most here.
  # ═══════════════════════════════════════════════════════════════════
  - id: "phase-1-hooks"
    type: recipe
    recipe: "rust-rewrite-phase"
    sub_context:
      phase_name: "pre-tool-use"
      phase_number: "1"
      components: |
        hook_processor.py → protocol.rs
          HookProcessor base class → Hook trait + run_hook() framework
          read_input() → read JSON from stdin with size limit
          write_output() → write JSON to stdout with SIGPIPE handling
          validate_path_containment() → Path::canonicalize() + starts_with()
          run() → run_hook() with catch_unwind panic handler
          log() → structured logging to file with rotation
          save_metric() → JSONL append

        pre_tool_use.py → pre_tool_use/
          process() → main PreToolUse handler
          _check_cwd_deletion() → cwd.rs: detect rm/rmdir targeting CWD
          _check_cwd_rename() → cwd.rs: detect mv targeting CWD
          _extract_rm_paths() → command_parser.rs: shlex + path extraction
          _extract_mv_source_paths() → command_parser.rs: mv argument parsing
          _select_strategy() → strategy.rs: launcher detection enum

        amplihack-types crate:
          HookInput — serde tagged enum with #[serde(other)] Unknown
          HookOutput — allow/deny/block/ask decision types
          ToolDecision — exhaustive enum (not strings)
          FailurePolicy — Open/Closed enum

        amplihack-state crate:
          AtomicJsonFile<T> — read-modify-write with temp+rename
          FileLock — F_SETLK with timeout + retry
          AtomicFlag — O_CREAT|O_EXCL semaphore
          EnvVar<T> — typed env var parsing with defaults
          PythonBridge — spawn_python() with timeout + embedded scripts
    output: "phase_1_result"

  # ═══════════════════════════════════════════════════════════════════
  # NOTE: XPIA Defender is tracked in a SEPARATE issue/repo.
  # It is NOT part of this master recipe.
  # ═══════════════════════════════════════════════════════════════════

  # ═══════════════════════════════════════════════════════════════════
  # PHASE 2: CLI + LAUNCHER — conditional on Phase 0a result
  # ═══════════════════════════════════════════════════════════════════
  - id: "phase-2-cli"
    type: recipe
    recipe: "rust-rewrite-phase"
    condition: "'PHASE2_GO' in phase_2_gate"
    sub_context:
      phase_name: "cli-launcher"
      phase_number: "2"
      components: |
        cli.py create_parser() → amplihack-cli/src/commands/
          clap derive with exhaustive Command enum
          Every subcommand is an enum variant (no string dispatch)
          #[command(subcommand_required = true)] where appropriate
          Commands that need Python (memory, eval, proxy) → subprocess

        cli.py launch_command() → amplihack-cli/src/launcher.rs
          ManagedChild with corrected Drop:
            try_wait → SIGTERM → 3s bounded wait → SIGKILL → wait
          Nesting detection via AMPLIHACK_SESSION_ID env var
          Session tracking with atomic file operations

        cli.py main() → bins/amplihack/main.rs
          Entry point with signal handling (SIGINT+SIGTERM+SIGHUP)
          Child in own process group (setpgid) to prevent double-SIGINT

        settings.py → amplihack-state/src/settings_manager.rs
          AtomicJsonFile-based settings.json CRUD
          update_hook_paths() with HOOK_REGISTRY
          ensure_settings_json() with backup and validation

        auto_update.py → amplihack-cli/src/auto_update.rs
          GitHub Releases API check with 24h cache
          Version comparison (semver)
          uv tool upgrade subprocess
          Argument whitelist for restart safety
    output: "phase_2_result"

  # ═══════════════════════════════════════════════════════════════════
  # NOTE: "Remaining Hooks" has been merged into Phase 1.
  # All hooks (Tier 1, Tier 2, Tier 3) are built in Phase 1.
  # See the Work Decomposition section for the tier breakdown.
  # ═══════════════════════════════════════════════════════════════════

  # ═══════════════════════════════════════════════════════════════════
  # FINAL VERIFICATION — the project is not done until this passes
  # ═══════════════════════════════════════════════════════════════════
  - id: "final-verification"
    type: agent
    agent: "amplihack:reviewer"
    prompt: |
      Run the COMPLETE project acceptance criteria.

      Phase results:
      - Phase 0: {{phase_0_result}}
      - Phase 1: {{phase_1_result}}
      - Phase 2: {{phase_2_result}} (may be empty if skipped)

      CHECK EVERY ITEM. Mark pass/fail with evidence:

      CORRECTNESS:
      [ ] Every function in the Python inventory has a Rust equivalent
      [ ] Every golden file passes on both Python and Rust engines
      [ ] Every property test passes (10K+ random inputs per function)
      [ ] All Python bugs documented in PYTHON_BUGS.md are fixed in Rust
      [ ] All TOCTOU races from the original analysis are eliminated
      [ ] All stringly-typed dispatch is replaced with exhaustive enums
      [ ] All bare `except Exception` blocks have typed error handling
      [ ] Forward compatibility: unknown hook events handled gracefully

      BEHAVIORAL EQUIVALENCE:
      [ ] `amplihack hooks compare --all` shows 0 mismatches between Python and Rust
          across 5 consecutive real development sessions
      [ ] `amplihack hooks compare --transcript` shows 100% match
          on 3 different real session transcripts
      [ ] Functional equivalence matrix is 100% green for all phases

      INTEGRATION:
      [ ] Multicall binary dispatches all hook events correctly
      [ ] Exit code contract matches Python (0, 2, other) for all hooks
      [ ] SIGPIPE handling works (no broken pipe errors)
      [ ] Fail-open: panic in any hook → stdout gets b"{}", exit 0

      DISTRIBUTION:
      [ ] Platform wheel builds succeed for all 4 targets
      [ ] UVX install + staging flow tested end-to-end
      [ ] Binary discovery finds Rust binary in expected locations
      [ ] `amplihack install` correctly registers Rust hooks when binaries present

      DOCUMENTATION:
      [ ] CONTRIBUTING_RUST.md exists and is accurate
      [ ] PR descriptions include correctness improvement details
      [ ] Dogfooding log documents all issues found and fixed

      If ANY item fails: output {"status": "NOT_CERTAIN", "failures": [...]}
      If ALL items pass: output {"status": "CERTAIN", "evidence": {...}}

      Do NOT output CERTAIN unless you have verified every item.
    output: "final_verification"
    parse_json: true
```

## Appendix: Claude Code Hooks Reference

**Official documentation:** https://code.claude.com/docs/en/hooks (reference) and https://code.claude.com/docs/en/hooks-guide (guide)

### Hook Events (all 18)

| Event | Matcher On | Can Block? | Key Fields |
|-------|-----------|------------|------------|
| `SessionStart` | source: startup/resume/clear/compact | No | source, model, agent_type |
| `InstructionsLoaded` | none (always fires) | No | file_path, memory_type, load_reason |
| `UserPromptSubmit` | none (always fires) | Yes (exit 2 or decision:block) | prompt |
| `PreToolUse` | tool name (Bash, Edit, Write, etc.) | Yes (permissionDecision: deny) | tool_name, tool_input, tool_use_id |
| `PermissionRequest` | tool name | Yes (decision.behavior: deny) | tool_name, tool_input, permission_suggestions |
| `PostToolUse` | tool name | No (feedback only) | tool_name, tool_input, tool_response, tool_use_id |
| `PostToolUseFailure` | tool name | No | tool_name, tool_input, error, is_interrupt |
| `Notification` | type: permission_prompt/idle_prompt/auth_success/elicitation_dialog | No | message, title, notification_type |
| `SubagentStart` | agent type | No | agent_id, agent_type |
| `SubagentStop` | agent type | Yes (decision:block) | agent_id, agent_type, agent_transcript_path, last_assistant_message |
| `Stop` | none (always fires) | Yes (decision:block) | stop_hook_active, last_assistant_message |
| `TeammateIdle` | none | Yes (exit 2) | teammate_name, team_name |
| `TaskCompleted` | none | Yes (exit 2) | task_id, task_subject, task_description |
| `ConfigChange` | source: user/project/local/policy/skills | Yes (except policy_settings) | source, file_path |
| `WorktreeCreate` | none | Yes (non-zero fails) | name; stdout = absolute path |
| `WorktreeRemove` | none | No | worktree_path |
| `PreCompact` | trigger: manual/auto | No | (common fields only) |
| `SessionEnd` | reason: clear/logout/prompt_input_exit/other | No | (common fields only) |

### Common Input Fields

```json
{
  "session_id": "abc123",
  "transcript_path": "/path/to/transcript.jsonl",
  "cwd": "/path/to/project",
  "permission_mode": "default|plan|acceptEdits|dontAsk|bypassPermissions",
  "hook_event_name": "PreToolUse",
  "agent_id": "optional-subagent-id",
  "agent_type": "optional-agent-type"
}
```

### Exit Code Contract

| Exit Code | Meaning | Behavior |
|-----------|---------|----------|
| **0** | Success | Parse stdout for JSON output; proceed |
| **2** | Blocking error | Block the action; stderr shown to Claude as error |
| **Other** | Non-blocking error | Log stderr; continue execution |

### JSON Output Fields

```json
{
  "continue": true,
  "stopReason": "message",
  "suppressOutput": false,
  "systemMessage": "warning",
  "decision": "block",
  "reason": "why blocked",
  "hookSpecificOutput": {
    "hookEventName": "PreToolUse",
    "permissionDecision": "allow|deny|ask",
    "permissionDecisionReason": "explanation",
    "updatedInput": {},
    "additionalContext": ""
  }
}
```

### Rust Implementation Notes

- The multicall `amplihack-hooks` binary handles all 18 events as subcommands
- Read JSON from stdin; no allocation in panic path (pre-baked `b"{}"` for fail-open)
- `#[serde(other)] Unknown` variant for forward compatibility with new events
- Hooks snapshot at session startup — mid-session changes require `/hooks` menu review
- `async: true` field means run in background without blocking (for logging hooks)
- MCP tool names follow pattern `mcp__<server>__<tool>`
- `$CLAUDE_CODE_REMOTE` is `"true"` in remote/web environments
- `CLAUDE_ENV_FILE` in SessionStart allows persisting env vars for the session







Task Type	Agent Type	When to Use
Understanding Python code	`explore`	Before implementing each function — understand what the Python does, what edge cases it handles, what state it reads/writes
Implementing a Rust function	Do it yourself	The main agent should write Rust code directly — it has the full context of the issue spec, patterns, and prior decisions
Writing golden file test inputs	`task`	Generating synthetic JSON inputs for a hook — mechanical work that benefits from parallelism
Running tests / building	`task`	`cargo test`, `cargo clippy`, golden file comparison — fire and forget, read results
Reviewing a completed hook	`code-review`	After implementing a hook, have a code-review agent check for correctness issues, missing edge cases, silent error swallowing
Investigating a Python bug	`explore`	When a golden file reveals Python behavior that looks wrong — investigate before deciding whether to replicate or fix

Pattern	When Used	Overhead
Standalone binary	CLI, hooks (Claude Code / Amplifier / Copilot invoke directly)	~1-2ms startup
Subprocess JSON IPC	SDK bridge (Rust calls Python for Claude SDK)	~50ms per call

Control	Phase	Implementation
Configurable fail-open/closed	1	FailurePolicy enum per hook
Bridge script embedding	1	include_str!() at compile time
Settings.json integrity check	2	Verify hook paths at launch
Binary signing	2	cosign in CI for releases
Telemetry (every decision logged)	1	Stderr JSON, one line per invocation
Stale lock detection	1	Check holder PID liveness, break if dead

Function	Lines	LLM Calls	Notes
`HookProcessor.__init__(hook_name)`	45-101	No	Project root detection, log setup
`HookProcessor.validate_path_containment(path)`	103-121	No	Path stays within project
`HookProcessor.log(message, level)`	123-144	No	Log rotation at 10MB
`HookProcessor.read_input()`	146-172	No	JSON from stdin
`HookProcessor.write_output(output)`	174-205	No	JSON to stdout, SIGPIPE handling
`HookProcessor.save_metric(name, value, metadata)`	207-231	No	JSONL metrics file
`HookProcessor.run()`	274-368	No	Full lifecycle with error handling
`HookProcessor.get_session_id()`	370-377	No	Timestamp-based session ID
`HookProcessor.save_session_data(filename, data)`	379-406	No	Session file with path validation

Function	Lines	LLM Calls	Notes
`PreToolUseHook.process(input_data)`	95-213	No	Main entry — validates bash commands
`PreToolUseHook._check_cwd_deletion(command)`	215-269	No	Detects CWD deletion
`PreToolUseHook._check_cwd_rename(command)`	271-359	No	Detects CWD rename/move, glob patterns
`PreToolUseHook._extract_rm_paths(segment)`	361-391	No	shlex.split() path extraction
`PreToolUseHook._extract_mv_source_paths(segment)`	393-463	No	mv -t/--target-directory parsing
`PreToolUseHook._select_strategy()`	465-481	No	Launcher detection (Claude vs Copilot)

Function	Lines	LLM Calls	Notes
`StopHook.process(input_data)`	67-358	No	Lock mode, safety valve (Issue #2874)
`StopHook.read_continuation_prompt()`	387-430	No	Custom prompt from file or default
`StopHook._increment_power_steering_counter(session_id)`	432-498	No	Atomic counter file
`StopHook._increment_lock_counter(session_id)`	499-555	No	Safety valve counter
`StopHook._should_run_power_steering()`	557-593	No	Config check
`StopHook._should_run_reflection()`	595-640	No	Config check
`StopHook._get_current_session_id()`	642-673	No	Env var or filesystem
`StopHook._run_reflection_sync(transcript_path)`	675-782	YES	SDK bridge needed (subprocess to Python)
`StopHook._announce_reflection_start()`	784-796	No	Stderr output
`StopHook._generate_reflection_filename(template)`	798-823	No	Filename from content
`StopHook._block_with_findings(template, path)`	825-868	No	Block decision JSON

Function	Lines	LLM Calls	Notes
`SessionStartHook.process(input_data)`	41-591	YES	Version check, memory, staging. SDK bridge for memory calls.
`SessionStartHook._check_version_mismatch()`	386-539	No	Version comparison, auto-update trigger
`SessionStartHook._migrate_global_hooks()`	541-592	No	Duplicate hook migration

Function	Lines	LLM Calls	Notes
`PostToolUseHook.process(input_data)`	82-173	No	Tool metrics, validation
`PostToolUseHook._setup_tool_hooks()`	34-67	No	Hook registry setup
`PostToolUseHook.save_tool_metric(tool_name, duration_ms)`	69-80	No	JSONL metric

Function	Lines	LLM Calls	Notes
`UserPromptSubmitHook.process(input_data)`	257-400	YES	Memory injection — SDK bridge for inject_memory_for_agents_sync()
`UserPromptSubmitHook.find_user_preferences()`	36-56	No	File discovery
`UserPromptSubmitHook.extract_preferences(content)`	58-99	No	Markdown parsing
`UserPromptSubmitHook.build_preference_context(prefs)`	101-156	No	Context string builder
`UserPromptSubmitHook.get_cached_preferences(path)`	158-184	No	mtime-based cache
`UserPromptSubmitHook._inject_amplihack_if_different()`	186-255	No	File diff with mtime cache

Function	Lines	LLM Calls	Notes
`PreCompactHook.process(input_data)`	33-137	No	Transcript export
`PreCompactHook.restore_conversation_from_latest()`	139-184	No	Transcript restore

Function	Lines	LLM Calls	Notes
`validate_hook_paths(hook_system, hooks, dir)`	113-135	No	File existence check
`update_hook_paths(settings, system, hooks, dir)`	138-225	No	JSON path rewriting
`ensure_settings_json()`	228-337	No	Atomic settings creation

Category	Total LOC	Functions w/ LLM Calls	Pure Rust	SDK Bridge
Hook processor	407	0	9	0
pre_tool_use	492	0	6	0
stop	902	1	14	1 (reflection)
session_start	601	1	3	1 (memory)
session_stop	86	1	0	1 (memory)
post_tool_use	203	0	3	0
user_prompt	451	1	6	1 (memory)
pre_compact	195	0	2	0
Settings	341	0	3	0
Total	~3,678	4	46	4

Scenario	What Happens	User Action
`uv tool upgrade amplihack`	UV downloads platform wheel with Rust binaries. Next session re-stages and rewrites hook paths.	None
`uvx` (zero-install)	Fresh temp venv from latest wheel. Staging copies binaries.	None
`git pull` (editable install)	Gets Cargo workspace. Must build Rust. Next `amplihack install` discovers binaries.	`cargo build --release`
Unsupported platform	Pure-Python wheel installed. Hooks remain Python scripts. User is on the pre-migration path.	None
Rust binary crashes	Hook fails. Claude Code sees exit code != 0, stderr has error. Fix the bug.	Report bug, fix in next release

Phase	Requirement
Phase 1 (Hooks)	After first 3 hooks compile: switch to Rust hooks for ALL subsequent development. Fix bugs as they surface.
Phase 2 (CLI)	After CLI compiles: use `amplihack-rs claude` as the launcher. No Python CLI alongside.
Phase 2 (CLI)	After CLI compiles: use `amplihack-rs claude` as the launcher. No Python CLI alongside.

Step	Why Sequential	Blocks
`amplihack-common` crate	All hooks import shared types (HookInput, HookOutput, JsonProtocol, ErrorProtocol, FileLock, ShutdownContext)	Everything
SDK bridge module (`python_bridge.rs`)	4 hooks need subprocess-to-Python calls; define the pattern once	Tier 2 + Tier 3 hooks
Golden file infrastructure	Test harness must exist before hook implementation can be verified	All hook implementation

Hook	LOC	Shared State	SDK Bridge?	Notes
`pre_tool_use`	492	Reads config files (read-only)	No	Largest hook. Pure input→decision. Start here — it exercises the most common types.
`post_tool_use`	203	Reads config, writes metrics file	No	Smallest non-trivial hook. Good second target.
`session_stop`	86	None	Yes (MemoryCoordinator.store)	Smallest hook. SDK bridge is the only complexity.

Hook	LOC	Shared State	SDK Bridge?	Notes
`user_prompt_submit`	451	Reads preference files	Yes (inject_memory_for_agents_sync)	Memory injection. Can reuse bridge pattern from session_stop.
`pre_compact`	195	Reads transcript files	No	Context preservation. Independent of user_prompt_submit.

Hook	LOC	Shared State	SDK Bridge?	Notes
`stop`	902	Writes lock files, counter files, continuation prompt files	Yes (reflection)	Largest, most complex hook. Lock mode + power steering state machine.
`session_start`	601	Reads lock files written by stop, writes session state	Yes (memory context)	Coordinates with stop via shared file locks.

Scope	Branch Pattern	PR Size Target
Phase 0 scaffolding	`amplihack-rs/phase-0-scaffold`	1 PR
Phase 0b golden files	`amplihack-rs/golden-files`	1 PR
Each hook	`amplihack-rs/hook-{name}` (e.g., `hook-pre-tool-use`)	1 PR per hook
XPIA crate	Tracked separately in `amplihack-xpia-defender` repo	Separate issue
CLI (if Phase 2)	`amplihack-rs/cli-{component}`	2-3 PRs
Infrastructure (CI, build)	`amplihack-rs/ci-{what}`	Small PRs

Scenario	Action
Bug found in Python hook	Fix in Python. Add golden file test. Port fix to Rust. Verify parity.
Bug found in Rust hook	Fix in Rust. Add golden file test. Check if Python has same bug — fix if so.
New feature added to Python hook	Implement in Python first. Add golden file test. Port to Rust before next release.
New Claude Code hook event added	Add `#[serde(other)] Unknown` handles it. Then implement in Python, then Rust.
Python hook diverges from Rust (CI catches)	Block merge. Fix whichever side is wrong.

Risk	Prob	Impact	Mitigation
Claude Code hook protocol changes	Med	High	`#[serde(other)] Unknown` + `#[serde(default)]` everywhere. Fuzz with extra fields.
shell-words inadequate for edge cases	Med	High	Property-test against /bin/sh. Custom post-processor for `&&`/`
Two build systems (Cargo + setuptools)	High	Med	Design CI FIRST (Phase 0d). Use cibuildwheel for wheels.
Contributor friction (Rust toolchain)	High	Med	CONTRIBUTING_RUST.md. Rust basics gate in Phase 0.
macOS code signing / Gatekeeper	Med	Med	cosign + notarization in CI from Phase 2.
Rust hook crashes in production	Med	High	Catch_unwind returns `b"{}"` (fail-open for non-security hooks). Fix the bug immediately. No silent fallback to Python.
Python hooks diverge during transition	Med	Med	Parity test harness in CI blocks merge on diff.

amplihack-rs: Rust core runtime for deterministic infrastructure #2961

Description

amplihack-rs: Rust Core Runtime for Deterministic Infrastructure

⚡ START HERE (Agent: Read This First)

Non-Negotiable Rules

Anti-Patterns That Will Cause Rejection

Bootstrap: Setting Up the Dev Environment

Overview

Architecture

Interop: Two Patterns Only

Crate Structure (5 crates)

Key Design Decisions

Prior Art: amplihack-recipe-runner

Rust Patterns

Panic Handler

AtomicJsonFile (panic-safe temp cleanup)

ManagedChild Drop (bounded, never blocks forever)

Signal Handling (all signals, child in own pgroup)

Hook Input (forward-compatible deserialization)

Error Strategy

Shell Command Parsing

File Locks

Implementation Phases

Phase 0: Foundation + Validation (3 weeks)

Phase 1: pre_tool_use Hook (3 weeks) — HIGHEST VALUE

Phase 2: CLI + Launcher (3 weeks) — CONDITIONAL

Security Controls (Integrated Per Phase)

Python Function Inventory (What Gets Replaced)

Hook System (~2,730 LOC total)

hook_processor.py (407 LOC) → amplihack-hooks/src/protocol.rs

pre_tool_use.py (492 LOC) → amplihack-hooks/src/pre_tool_use/

stop.py (902 LOC) → amplihack-hooks/src/stop/

session_start.py (601 LOC) → amplihack-hooks/src/session_start.rs

session_stop.py (86 LOC) → amplihack-hooks/src/session_stop.rs

post_tool_use.py (203 LOC) → amplihack-hooks/src/post_tool_use.rs

user_prompt_submit.py (451 LOC) → amplihack-hooks/src/user_prompt.rs

pre_compact.py (195 LOC) → amplihack-hooks/src/pre_compact.rs

XPIA Security System → Tracked separately in amplihack-xpia-defender

Settings System (341 LOC) → amplihack-state/ or amplihack-cli/

Summary

SDK Bridge Contracts (4 Functions)

1. _run_reflection_sync() (stop hook)

2. SessionStartHook memory calls

3. session_stop memory store

4. inject_memory_for_agents_sync() (user_prompt_submit)

Behavioral Baselines

Distribution, Migration & Release Pipeline

Binary Sizes

Dual-Wheel Strategy

CI/CD Pipeline: rust-build.yml

Changes to build_hooks.py

Binary Discovery at Runtime

Hook Path Migration

UVX Staging

Version Synchronization

User Upgrade Scenarios

macOS Code Signing

Agent Execution Protocol

⚠️ MANDATORY: Default Workflow via Recipe Runner for ALL Code Changes

Autonomous Loop

Triple-Check Verification Protocol

Check 1: Functional Equivalence Matrix

Check 2: Fuzz / Property Testing

Check 3: Integration Smoke Test

Side-by-Side Evaluation Framework

1. amplihack hooks compare CLI Command (Development Tool)

2. Golden File Test Suite

Dogfooding Protocol

CI Gates

Work Decomposition & Delegation

Dependency Graph

What Must Be Sequential

What Can Run in Parallel

Sub-Agent Delegation Guide

Recommended Execution Order

Checkpoint Strategy

Branch & PR Strategy

Commit Cadence

Recovery Strategy

Parity Maintenance Policy

`hook_processor.py` (407 LOC) → `amplihack-hooks/src/protocol.rs`

`pre_tool_use.py` (492 LOC) → `amplihack-hooks/src/pre_tool_use/`

`stop.py` (902 LOC) → `amplihack-hooks/src/stop/`

`session_start.py` (601 LOC) → `amplihack-hooks/src/session_start.rs`

`session_stop.py` (86 LOC) → `amplihack-hooks/src/session_stop.rs`

`post_tool_use.py` (203 LOC) → `amplihack-hooks/src/post_tool_use.rs`

`user_prompt_submit.py` (451 LOC) → `amplihack-hooks/src/user_prompt.rs`

`pre_compact.py` (195 LOC) → `amplihack-hooks/src/pre_compact.rs`

XPIA Security System → Tracked separately in `amplihack-xpia-defender`

Settings System (341 LOC) → `amplihack-state/` or `amplihack-cli/`

1. `_run_reflection_sync()` (stop hook)

2. `SessionStartHook` memory calls

3. `session_stop` memory store

4. `inject_memory_for_agents_sync()` (user_prompt_submit)

CI/CD Pipeline: `rust-build.yml`

Changes to `build_hooks.py`

1. `amplihack hooks compare` CLI Command (Development Tool)

Per-Component Recipe: `rust-rewrite-component.yaml`

Phase Recipe: `rust-rewrite-phase.yaml`

Master Recipe: `amplihack-rs-master.yaml`

Phase	Duration	Cumulative	Deliverable
0: Foundation	3 weeks	3 weeks	Golden files, workspace, Python experiment
1: All Hooks	5-6 weeks	8-9 weeks	All hooks in Rust (3 tiers)
2: CLI launcher	3 weeks	11-12 weeks	Rust CLI binary (conditional)

Event	Matcher On	Can Block?	Key Fields
`SessionStart`	source: startup/resume/clear/compact	No	source, model, agent_type
`InstructionsLoaded`	none (always fires)	No	file_path, memory_type, load_reason
`UserPromptSubmit`	none (always fires)	Yes (exit 2 or decision:block)	prompt
`PreToolUse`	tool name (Bash, Edit, Write, etc.)	Yes (permissionDecision: deny)	tool_name, tool_input, tool_use_id
`PermissionRequest`	tool name	Yes (decision.behavior: deny)	tool_name, tool_input, permission_suggestions
`PostToolUse`	tool name	No (feedback only)	tool_name, tool_input, tool_response, tool_use_id
`PostToolUseFailure`	tool name	No	tool_name, tool_input, error, is_interrupt
`Notification`	type: permission_prompt/idle_prompt/auth_success/elicitation_dialog	No	message, title, notification_type
`SubagentStart`	agent type	No	agent_id, agent_type
`SubagentStop`	agent type	Yes (decision:block)	agent_id, agent_type, agent_transcript_path, last_assistant_message
`Stop`	none (always fires)	Yes (decision:block)	stop_hook_active, last_assistant_message
`TeammateIdle`	none	Yes (exit 2)	teammate_name, team_name
`TaskCompleted`	none	Yes (exit 2)	task_id, task_subject, task_description
`ConfigChange`	source: user/project/local/policy/skills	Yes (except policy_settings)	source, file_path
`WorktreeCreate`	none	Yes (non-zero fails)	name; stdout = absolute path
`WorktreeRemove`	none	No	worktree_path
`PreCompact`	trigger: manual/auto	No	(common fields only)
`SessionEnd`	reason: clear/logout/prompt_input_exit/other	No	(common fields only)

Exit Code	Meaning	Behavior
0	Success	Parse stdout for JSON output; proceed
2	Blocking error	Block the action; stderr shown to Claude as error
Other	Non-blocking error	Log stderr; continue execution