-
Notifications
You must be signed in to change notification settings - Fork 14
Description
amplihack-rs: Rust Core Runtime for Deterministic Infrastructure
⚡ START HERE (Agent: Read This First)
What this is: A complete execution specification for rewriting amplihack's deterministic infrastructure layer from Python to Rust. Read this page section-by-section as you work — don't try to load everything at once.
Reading order:
- This summary (now) — understand scope, rules, and anti-patterns
- Architecture + Rust Patterns — when starting Phase 0
- Phase 0 details — when you start Phase 0
- Phase N details — when you start Phase N (not before)
- Recipes — execution specs to follow for each component
- Function Inventory — reference during implementation
- Appendix — Claude Code hooks protocol reference
Non-Negotiable Rules
-
DEFAULT WORKFLOW IS MANDATORY — VIA THE RECIPE RUNNER. Every code change — no matter how small — MUST be executed through the dev-orchestrator skill:
Skill(skill="dev-orchestrator"). Do NOT just read the workflow file — invoke the skill, which classifies the task, decomposes into workstreams, and executes via the recipe runner. Work done outside the recipe runner's default workflow WILL BE REJECTED. This is not optional, not "for big changes only," not skippable for "just a quick fix." If you're editing code, you're going through the dev-orchestrator. -
NO FALLBACKS. EVER. No "try Rust, fall back to Python." No auto-mode. No dual-engine runtime. No silent degradation. If Rust is the implementation, it's the ONLY implementation. If it breaks, fix it. If it produces wrong output, that's a bug. Fallbacks create silent failure and garbage code.
-
CORRECTNESS OVER PERFORMANCE. Speed is a side effect of Rust's guarantees, not a goal. Every decision should be made for correctness, type safety, and eliminating entire categories of bugs. Don't optimize for benchmarks — optimize for "does this code produce correct output in all cases?"
-
WORK AUTONOMOUSLY UNTIL CERTAIN. Work in a continuous loop. Don't stop to ask permission between steps. Only stop when truly blocked on missing information that can't be reasoned about. When you think you're done, triple-check: run all tests, verify all golden files match, dogfood the result.
-
COMPARISON IS A DEVELOPMENT TOOL, NOT A RUNTIME MODE. Use
amplihack hooks compareto compare Python and Rust behavior during development. This is an offline tool you run explicitly. It is NOT a production mode where both engines run simultaneously.
Anti-Patterns That Will Cause Rejection
- ❌
AMPLIHACK_HOOK_ENGINE=auto— NO AUTO MODE - ❌
try { rust() } catch { python() }— NO FALLBACK - ❌ "If Rust binary not found, use Python" — NO SILENT DEGRADATION
- ❌ Skipping the default workflow for "small" changes
- ❌ Treating performance as the primary goal instead of correctness
- ❌ Swallowing errors with bare
catch/except Exception/unwrap_or_default()
Bootstrap: Setting Up the Dev Environment
# 1. Clone the repos
git clone https://github.com/rysweet/amplihack # Python codebase
git clone https://github.com/rysweet/amplihack-rs # Rust workspace (create this)
# 2. Install Rust toolchain
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
rustup default stable
rustup component add clippy rustfmt
# 3. Install cross-compilation targets
rustup target add x86_64-unknown-linux-gnu
rustup target add aarch64-unknown-linux-gnu
rustup target add x86_64-apple-darwin
rustup target add aarch64-apple-darwin
# 4. Set up Python environment (for running existing tests)
cd amplihack
uv sync # or: pip install -e ".[dev]"
# 5. Run existing Python tests (baseline — these must keep passing)
pytest tests/ -x --tb=short
# 6. Build Rust workspace
cd amplihack-rs
cargo build
cargo clippy -- -D warnings
cargo testOverview
Split amplihack into two layers: a Rust deterministic infrastructure layer (CLI, hooks) and a Python LLM orchestration layer (proxy, agents, eval, bundles). The Rust layer handles security-critical code where type safety, correctness, and compile-time guarantees matter most. The Python layer handles non-deterministic LLM interactions where flexibility matters most.
Note: XPIA defender is being extracted into its own repo (
amplihack-xpia-defender) — see issue #2969. It is NOT part of this issue's scope.
Note: Hooks are NOT Claude-Code-specific. They must work with Claude Code, Amplifier, and Copilot. The hook protocol layer must be agent-host-agnostic.
Boundary rule: Security-critical + correctness-critical = Rust. Everything else = Python. New subsystems default to Python unless they meet BOTH criteria.
Scope: ~18K LOC moves to Rust (hooks + CLI). ~127K LOC stays permanently in Python. XPIA (~1.2K LOC) tracked separately.
Architecture
┌─────────────────────────────────────────────────────────┐
│ PYTHON LAYER (~127K LOC — stays permanently) │
│ Proxy, agents, eval, bundle gen, knowledge builder │
│ SDK bridge scripts (thin, embedded at compile time) │
├─────────────────────────────────────────────────────────┤
│ Subprocess JSON IPC (sole interop mechanism) │
├─────────────────────────────────────────────────────────┤
│ RUST LAYER (~18K LOC — new) │
│ CLI + launcher, hooks (Claude Code + Amplifier + Copilot) │
│ Recipe runner, log parser (already exist) │
└─────────────────────────────────────────────────────────┘
│ │
│ SEPARATE REPO: amplihack-xpia-defender │
│ XPIA patterns, defender, health (~1.2K LOC) │
└─────────────────────────────────────────────────────────┘
Interop: Two Patterns Only
| Pattern | When Used | Overhead |
|---|---|---|
| Standalone binary | CLI, hooks (Claude Code / Amplifier / Copilot invoke directly) | ~1-2ms startup |
| Subprocess JSON IPC | SDK bridge (Rust calls Python for Claude SDK) | ~50ms per call |
No PyO3. No gRPC. No FFI. Subprocess JSON is simple, debuggable, and already proven by amplihack-recipe-runner.
Crate Structure (5 crates)
Note: Hooks are host-agnostic. They process JSON stdin/stdout and work with Claude Code, Amplifier, and Copilot. The protocol layer does NOT assume any specific host.
amplihack-rs/
├── Cargo.toml # Workspace
├── crates/
│ ├── amplihack-types/ # THIN: IPC boundary types only (~200 LOC)
│ │ ├── src/hook_io.rs # HookInput, HookOutput (serde) — host-agnostic
│ │ ├── src/tool_decision.rs # Allow/Deny/Ask enum
│ │ └── src/settings.rs # Settings struct
│ │
│ ├── amplihack-state/ # File ops, locking, env, Python bridge
│ │ ├── src/atomic_json.rs # AtomicJsonFile<T> (temp+rename)
│ │ ├── src/file_lock.rs # Timeout-based locking (F_SETLK)
│ │ ├── src/semaphore.rs # TOCTOU-safe flags (O_CREAT|O_EXCL)
│ │ ├── src/counter.rs # Atomic counter files
│ │ ├── src/env_config.rs # EnvVar<T> typed parsing
│ │ └── src/python_bridge.rs # spawn_python() with timeout
│ │
│ ├── amplihack-hooks/ # Protocol + all hook implementations
│ │ ├── src/protocol.rs # run_hook(), panic handler, SIGPIPE
│ │ ├── src/pre_tool_use/ # CWD, branch, command, path safety
│ │ ├── src/stop/ # Lock mode, power steering
│ │ ├── src/session_start.rs
│ │ ├── src/user_prompt.rs
│ │ ├── src/post_tool_use.rs
│ │ └── src/pre_compact.rs
│ │
│ │
│ └── amplihack-cli/ # CLI + launcher (merged)
│ ├── src/commands/ # clap derive, exhaustive enum
│ ├── src/launcher.rs # ManagedChild (fixed Drop)
│ ├── src/signals.rs # SIGINT+SIGTERM+SIGHUP, setpgid
│ ├── src/binary_finder.rs
│ ├── src/env_builder.rs # Type-safe PATH construction
│ └── src/settings_manager.rs # Atomic settings.json CRUD
│
├── bins/
│ ├── amplihack/main.rs # CLI binary
│ └── amplihack-hooks/main.rs # Multicall hook binary
│ (dispatches: pre-tool-use, stop, session-start, etc.)
│
├── bridge/ # Embedded Python bridge scripts
│ ├── claude_reflect.py # Compiled into binary via include_str!
│ └── memory_store.py
│
└── tests/
├── golden/ # Captured stdin/stdout pairs
├── parity/ # Python vs Rust comparison
├── proptest/ # Property-based + fuzz
└── integration/
Key Design Decisions
Multicall hook binary: Single amplihack-hooks binary with subcommand dispatch instead of 7 separate binaries. Saves ~40% binary size (shared std lib, serde). Registered as:
{"command": "/path/to/amplihack-hooks pre-tool-use", "timeout": 10}Thin types crate: Only IPC boundary types live here. Domain types (power steering state) live in their domain crates. Prevents "god types" coupling magnet.
Embedded bridge scripts: Python SDK bridge scripts compiled into the binary via include_str!(), written to temp file with 0o600 permissions at runtime. Prevents PATH injection / script replacement.
Configurable fail policy: Per-hook failure policy:
enum FailurePolicy { Open, Closed }
// pre_tool_use, session_start, etc. → Open (don't break the user)
// Security-critical hooks → Closed (reject on error)Prior Art: amplihack-recipe-runner
The amplihack-recipe-runner repo (PR #2951) was the first Rust component. It proved that subprocess JSON IPC works as an interop mechanism — reuse this pattern.
What NOT to copy from amplihack-recipe-runner:
- Any pattern where failure is silently swallowed — errors must be visible
amplihack-recipe-runner was a first attempt and should NOT be treated as a style guide. The Rust patterns section below (reviewed by 5 independent reviewers including a senior Rust engineer) supersedes anything in that codebase. Follow idiomatic Rust and the patterns documented here.
Rust Patterns
Panic Handler
// Cargo.toml: panic = "unwind" (NOT abort — need catch_unwind)
pub fn run_hook<H: Hook>(hook: H) {
std::panic::set_hook(Box::new(|_| {})); // suppress stderr
let result = std::panic::catch_unwind(AssertUnwindSafe(|| {
let input = read_stdin()?;
hook.process(input).and_then(write_stdout)
}));
match result {
Ok(Ok(())) => {}
_ => { let _ = io::stdout().write_all(b"{}"); } // pre-baked, no alloc
}
}AtomicJsonFile (panic-safe temp cleanup)
pub fn update<F>(&self, f: F) -> Result<T>
where F: FnOnce(&mut T) -> Result<()>, T: Serialize + DeserializeOwned + Clone
{
let _lock = FileLock::exclusive(&self.lock_path, Duration::from_secs(5))?;
let mut data = self.read()?;
f(&mut data)?; // if panics: lock drops, no temp file exists yet
let temp = NamedTempFile::new_in(self.path.parent().unwrap_or(Path::new(".")))?;
serde_json::to_writer_pretty(temp.as_file(), &data)?;
temp.persist(&self.path)?; // atomic rename
Ok(data)
}ManagedChild Drop (bounded, never blocks forever)
impl Drop for ManagedChild {
fn drop(&mut self) {
if matches!(self.child.try_wait(), Ok(Some(_))) { return; }
#[cfg(unix)]
unsafe { libc::kill(self.child.id() as i32, libc::SIGTERM); }
let deadline = Instant::now() + Duration::from_secs(3);
while Instant::now() < deadline {
if matches!(self.child.try_wait(), Ok(Some(_))) { return; }
thread::sleep(Duration::from_millis(50));
}
let _ = self.child.kill();
let _ = self.child.wait();
}
}Signal Handling (all signals, child in own pgroup)
for sig in [SIGINT, SIGTERM, SIGHUP] {
signal_hook::flag::register(sig, Arc::clone(&shutdown))?;
}
#[cfg(unix)]
let child = Command::new(&bin)
.pre_exec(|| { unsafe { libc::setpgid(0, 0); } Ok(()) })
.spawn()?;
// Main loop polls shutdown flag + child.try_wait()Hook Input (forward-compatible deserialization)
#[derive(Deserialize)]
#[serde(tag = "hook_event_name")]
enum HookInput {
#[serde(rename = "PreToolUse")]
PreToolUse { tool_name: String, tool_input: Value },
#[serde(rename = "Stop")]
Stop { transcript_path: Option<PathBuf> },
// ... known events ...
#[serde(other)]
Unknown, // Future Claude Code events → graceful no-op
}Error Strategy
- Library crates (
types,state):thiserrorcustom errors - Binary crates (
cli,hooks):anyhowwith.context() - Never
Box<dyn Error>— worst of both worlds
Shell Command Parsing
// shell-words for tokenization + manual command separator splitting
fn split_commands(input: &str) -> Vec<Vec<String>> {
let tokens = shell_words::split(input).unwrap_or_default();
tokens.split(|t| ["&&", "||", ";", "|"].contains(&t.as_str()))
.map(|cmd| cmd.to_vec())
.filter(|cmd| !cmd.is_empty())
.collect()
}File Locks
F_SETLK (non-blocking) + retry with configurable timeout. Never F_SETLKW (blocks indefinitely). Check holder PID liveness, break if dead.
Implementation Phases
Phase 0: Foundation + Validation (3 weeks)
0a. Python experiment (2 days) — DECISION GATE for Phase 2
- Run
mypy --stricton hooks + CLI — how many type errors? How many are real bugs vs annotation gaps? - Add lazy imports to cli.py entry point
- Assess: does mypy catch the class of bugs we're worried about? If strict typing + lazy imports addresses the CLI correctness gaps, defer Phase 2 indefinitely
0b. Golden file capture (1 week)
- Do NOT rely on "running real sessions" for golden files. Instead, generate synthetic inputs:
- Read each hook's Python source to enumerate all code paths
- For each code path, craft a minimal JSON input that exercises it
- Include: happy path, error cases, edge cases (empty fields, missing fields, unicode, very long strings)
- For pre_tool_use: all tool types (Bash, Read, Write, Edit, etc.), all protection checks (CWD, branch, command safety)
- For stop: lock mode on/off, power steering states, reflection needed/not needed
- Capture ≥100 pairs per hook type (pre_tool_use needs ≥200)
- Run each synthetic input through Python hooks, capture stdout as expected output
- Flag golden files where Python behavior is wrong — fix Python bugs FIRST, then re-capture
- Build semantic JSON comparator (not string diff — handle float precision, whitespace, key ordering, null vs absent)
- Store in
tests/golden/{hook_type}/{test_name}.input.json+.expected.json
0c. Python test coverage uplift (1 week)
- Target: 60% on hooks (realistic baseline)
- Focus on security-critical paths in pre_tool_use and stop
- Focus on error paths (the
except Exceptionblocks)
0d. Workspace scaffolding (3 days)
- Cargo workspace with 6 crates
- CI:
cargo clippy -D warnings,cargo fmt --check,cargo test - Cross-compilation targets (4 platforms)
- IPC version protocol: every JSON message gets
{"version": 1, ...} CONTRIBUTING_RUST.mdwith patterns and build instructions
EXIT CRITERIA: Golden files captured, Python coverage ≥60% on hooks, workspace builds, at least 2 team members have completed Rust basics.
Phase 1: pre_tool_use Hook (3 weeks) — HIGHEST VALUE
1a. amplihack-types + amplihack-state (1 week)
- Thin boundary types (HookInput, ToolDecision, Settings)
- AtomicJsonFile with panic-safe temp cleanup
- FileLock with timeout (F_SETLK + retry)
- AtomicFlag (O_CREAT|O_EXCL)
- EnvVar typed parsing
- Python bridge module (spawn_python, timeout, embedded scripts)
1b. amplihack-hooks: protocol + pre_tool_use engine (1.5 weeks)
run_hook()withcatch_unwind, pre-bakedb"{}"- Forward-compatible serde deserialization
- Configurable FailurePolicy (Open for pre_tool_use)
- CWD protection, main branch protection
- Command analysis: shell-words + separator splitting
- Path safety:
Path::canonicalize(), no string concat
1c. Deploy + validate (3 days)
- Multicall binary:
amplihack-hooks pre-tool-use - Hook registration:
AMPLIHACK_HOOK_ENGINE=rustenables Rust hooks. No auto mode. No fallback. If Rust hook fails, it fails — fix it. - Parity tests: ≥200 golden files, semantic JSON comparison
- Telemetry: stderr JSON line per invocation (hook, duration_us, result)
EXIT CRITERIA: ≥95% golden file parity, zero false denies on production traffic for 1 week.
DECISION GATE: Measure real-world impact. If correctness gain is negligible and developer velocity dropped, STOP HERE. The project has still delivered value (one fast, type-safe security hook).
Phase 2: CLI + Launcher (3 weeks) — CONDITIONAL
XPIA Defender is tracked separately in its own issue/repo (
amplihack-xpia-defender). It is NOT part of this work.
GATE: Phase 0a must show Python startup >500ms even with lazy imports. If startup is acceptable in Python, skip this phase.
2a. CLI command parsing (1 week)
- clap derive with exhaustive Command enum
- Nested subcommands for plugin/recipe/memory
amplihack memorydelegates to Python subprocess
2b. Launcher + process management (1.5 weeks)
- ManagedChild with corrected Drop (bounded, never blocks)
- Signal handling: SIGINT + SIGTERM + SIGHUP via AtomicBool
- Child in own process group (setpgid) to prevent double-SIGINT
- Binary finder with version verification
- Atomic settings.json via AtomicJsonFile
- Type-safe env builder (set-based PATH, not substring matching)
2c. Nesting detection (3 days)
- Primary: environment variable marker (
AMPLIHACK_SESSION_ID) - Secondary: session file with PID + timestamp
- Stale session detection: if PID dead AND session >1h old, ignore
EXIT CRITERIA: Drop-in replacement. No behavior change vs Python on golden test suite. All type-safety bugs from the Python analysis (TOCTOU races, string dispatch, missing validation) proven eliminated by Rust's type system.
Note: Phase "Remaining Hooks" from previous versions of this plan has been merged into Phase 1. All hooks (Tier 1 + Tier 2 + Tier 3) are implemented in Phase 1. See "Work Decomposition" section for the tier breakdown.
Security Controls (Integrated Per Phase)
| Control | Phase | Implementation |
|---|---|---|
| Configurable fail-open/closed | 1 | FailurePolicy enum per hook |
| Bridge script embedding | 1 | include_str!() at compile time |
| Settings.json integrity check | 2 | Verify hook paths at launch |
| Binary signing | 2 | cosign in CI for releases |
| Telemetry (every decision logged) | 1 | Stderr JSON, one line per invocation |
| Stale lock detection | 1 | Check holder PID liveness, break if dead |
Deferred to post-v1: seccomp sandboxing, audit logging with tamper detection, rate limiting, hook chain integrity verification.
Python Function Inventory (What Gets Replaced)
Every function below must have a Rust equivalent that produces identical output for identical input. This is the exhaustive checklist.
Hook System (~2,730 LOC total)
hook_processor.py (407 LOC) → amplihack-hooks/src/protocol.rs
| Function | Lines | LLM Calls | Notes |
|---|---|---|---|
HookProcessor.__init__(hook_name) |
45-101 | No | Project root detection, log setup |
HookProcessor.validate_path_containment(path) |
103-121 | No | Path stays within project |
HookProcessor.log(message, level) |
123-144 | No | Log rotation at 10MB |
HookProcessor.read_input() |
146-172 | No | JSON from stdin |
HookProcessor.write_output(output) |
174-205 | No | JSON to stdout, SIGPIPE handling |
HookProcessor.save_metric(name, value, metadata) |
207-231 | No | JSONL metrics file |
HookProcessor.run() |
274-368 | No | Full lifecycle with error handling |
HookProcessor.get_session_id() |
370-377 | No | Timestamp-based session ID |
HookProcessor.save_session_data(filename, data) |
379-406 | No | Session file with path validation |
pre_tool_use.py (492 LOC) → amplihack-hooks/src/pre_tool_use/
| Function | Lines | LLM Calls | Notes |
|---|---|---|---|
PreToolUseHook.process(input_data) |
95-213 | No | Main entry — validates bash commands |
PreToolUseHook._check_cwd_deletion(command) |
215-269 | No | Detects CWD deletion |
PreToolUseHook._check_cwd_rename(command) |
271-359 | No | Detects CWD rename/move, glob patterns |
PreToolUseHook._extract_rm_paths(segment) |
361-391 | No | shlex.split() path extraction |
PreToolUseHook._extract_mv_source_paths(segment) |
393-463 | No | mv -t/--target-directory parsing |
PreToolUseHook._select_strategy() |
465-481 | No | Launcher detection (Claude vs Copilot) |
stop.py (902 LOC) → amplihack-hooks/src/stop/
| Function | Lines | LLM Calls | Notes |
|---|---|---|---|
StopHook.process(input_data) |
67-358 | No | Lock mode, safety valve (Issue #2874) |
StopHook.read_continuation_prompt() |
387-430 | No | Custom prompt from file or default |
StopHook._increment_power_steering_counter(session_id) |
432-498 | No | Atomic counter file |
StopHook._increment_lock_counter(session_id) |
499-555 | No | Safety valve counter |
StopHook._should_run_power_steering() |
557-593 | No | Config check |
StopHook._should_run_reflection() |
595-640 | No | Config check |
StopHook._get_current_session_id() |
642-673 | No | Env var or filesystem |
StopHook._run_reflection_sync(transcript_path) |
675-782 | YES | SDK bridge needed (subprocess to Python) |
StopHook._announce_reflection_start() |
784-796 | No | Stderr output |
StopHook._generate_reflection_filename(template) |
798-823 | No | Filename from content |
StopHook._block_with_findings(template, path) |
825-868 | No | Block decision JSON |
session_start.py (601 LOC) → amplihack-hooks/src/session_start.rs
| Function | Lines | LLM Calls | Notes |
|---|---|---|---|
SessionStartHook.process(input_data) |
41-591 | YES | Version check, memory, staging. SDK bridge for memory calls. |
SessionStartHook._check_version_mismatch() |
386-539 | No | Version comparison, auto-update trigger |
SessionStartHook._migrate_global_hooks() |
541-592 | No | Duplicate hook migration |
session_stop.py (86 LOC) → amplihack-hooks/src/session_stop.rs
| Function | Lines | LLM Calls | Notes |
|---|---|---|---|
main() |
29-82 | YES | MemoryCoordinator.store() — SDK bridge |
post_tool_use.py (203 LOC) → amplihack-hooks/src/post_tool_use.rs
| Function | Lines | LLM Calls | Notes |
|---|---|---|---|
PostToolUseHook.process(input_data) |
82-173 | No | Tool metrics, validation |
PostToolUseHook._setup_tool_hooks() |
34-67 | No | Hook registry setup |
PostToolUseHook.save_tool_metric(tool_name, duration_ms) |
69-80 | No | JSONL metric |
user_prompt_submit.py (451 LOC) → amplihack-hooks/src/user_prompt.rs
| Function | Lines | LLM Calls | Notes |
|---|---|---|---|
UserPromptSubmitHook.process(input_data) |
257-400 | YES | Memory injection — SDK bridge for inject_memory_for_agents_sync() |
UserPromptSubmitHook.find_user_preferences() |
36-56 | No | File discovery |
UserPromptSubmitHook.extract_preferences(content) |
58-99 | No | Markdown parsing |
UserPromptSubmitHook.build_preference_context(prefs) |
101-156 | No | Context string builder |
UserPromptSubmitHook.get_cached_preferences(path) |
158-184 | No | mtime-based cache |
UserPromptSubmitHook._inject_amplihack_if_different() |
186-255 | No | File diff with mtime cache |
pre_compact.py (195 LOC) → amplihack-hooks/src/pre_compact.rs
| Function | Lines | LLM Calls | Notes |
|---|---|---|---|
PreCompactHook.process(input_data) |
33-137 | No | Transcript export |
PreCompactHook.restore_conversation_from_latest() |
139-184 | No | Transcript restore |
XPIA Security System → Tracked separately in amplihack-xpia-defender
See the XPIA issue for full function inventory, patterns, and implementation plan.
Settings System (341 LOC) → amplihack-state/ or amplihack-cli/
| Function | Lines | LLM Calls | Notes |
|---|---|---|---|
validate_hook_paths(hook_system, hooks, dir) |
113-135 | No | File existence check |
update_hook_paths(settings, system, hooks, dir) |
138-225 | No | JSON path rewriting |
ensure_settings_json() |
228-337 | No | Atomic settings creation |
Summary
| Category | Total LOC | Functions w/ LLM Calls | Pure Rust | SDK Bridge |
|---|---|---|---|---|
| Hook processor | 407 | 0 | 9 | 0 |
| pre_tool_use | 492 | 0 | 6 | 0 |
| stop | 902 | 1 | 14 | 1 (reflection) |
| session_start | 601 | 1 | 3 | 1 (memory) |
| session_stop | 86 | 1 | 0 | 1 (memory) |
| post_tool_use | 203 | 0 | 3 | 0 |
| user_prompt | 451 | 1 | 6 | 1 (memory) |
| pre_compact | 195 | 0 | 2 | 0 |
| Settings | 341 | 0 | 3 | 0 |
| Total | ~3,678 | 4 | 46 | 4 |
Note: XPIA (~1,200 LOC) is tracked separately in
amplihack-xpia-defender.
Only 4 functions require SDK bridge calls. Everything else is pure data processing → pure Rust.
SDK Bridge Contracts (4 Functions)
Each SDK bridge function runs an embedded Python script via subprocess. The Rust side writes JSON to stdin, reads JSON from stdout, with a hard timeout.
1. _run_reflection_sync() (stop hook)
Input JSON (stdin):
{
"action": "reflect",
"transcript_path": "/path/to/transcript.jsonl",
"session_id": "abc-123"
}
Output JSON (stdout):
{
"should_block": true|false,
"findings": ["finding 1", "finding 2"],
"severity": "low"|"medium"|"high"
}
Timeout: 30s (reflection can be slow — Claude API call)
On timeout: treat as should_block=false (fail-open for UX, log warning)
On invalid JSON: ERROR — do not swallow
2. SessionStartHook memory calls
Input JSON:
{
"action": "get_context",
"session_id": "abc-123",
"project_path": "/path/to/project"
}
Output JSON:
{
"context": "string of context to inject",
"memories": [{"key": "...", "value": "..."}]
}
Timeout: 10s
On timeout: ERROR (memory is required for correct operation)
3. session_stop memory store
Input JSON:
{
"action": "store",
"session_id": "abc-123",
"transcript_path": "/path/to/transcript.jsonl"
}
Output JSON:
{
"stored": true,
"memories_count": 5
}
Timeout: 15s
On timeout: log ERROR (data loss) but don't block session exit
4. inject_memory_for_agents_sync() (user_prompt_submit)
Input JSON:
{
"action": "inject_memory",
"session_id": "abc-123",
"prompt": "user prompt text"
}
Output JSON:
{
"injected_context": "string to prepend to prompt",
"memory_keys_used": ["key1", "key2"]
}
Timeout: 5s
On timeout: ERROR — do not silently drop memory injection
Implementation pattern (same for all 4):
fn call_bridge(script: &str, input: &Value) -> Result<Value> {
let script_path = write_embedded_script(script)?; // 0o600 permissions
let output = Command::new("python3")
.arg(&script_path)
.stdin(Stdio::piped())
.stdout(Stdio::piped())
.stderr(Stdio::piped())
.spawn()?;
// Write input, read output, enforce timeout
// Parse JSON — error if invalid, NEVER silently default
// Clean up script_path
}Behavioral Baselines
Before writing Rust, capture Python behavioral baselines. The primary purpose is correctness verification, not speed optimization. Performance improvement is a welcome side effect but NOT the goal.
# Capture during Phase 0b golden file collection
# For each hook invocation, record:
{
"hook": "pre_tool_use",
"input": { ... }, // ← full JSON input
"python_output": { ... }, // ← full JSON output
"python_exit_code": 0,
"python_stderr": "",
"python_duration_us": 145000, // ← informational, not a target
"input_size_bytes": 342,
"timestamp": "2026-03-09T..."
}The agent MUST capture actual behavioral baselines in Phase 0b. These become the correctness oracle — Rust output must match Python output for every recorded invocation.
Distribution, Migration & Release Pipeline
Binary Sizes
amplihack (~3MB stripped, LTO)
amplihack-hooks (~2MB stripped, LTO)
Total: ~5MB
Dual-Wheel Strategy
Publish TWO types of wheels:
- Pure-Python wheel (
amplihack-X.Y.Z-py3-none-any.whl) — no Rust binaries, hooks remain Python scripts. This is the pre-migration state, not a "fallback." - Platform wheels (
amplihack-X.Y.Z-py3-none-manylinux_2_28_x86_64.whl, etc.) — contains Rust binaries inamplihack/bin/, hooks are Rust binaries.
pip and UV automatically prefer platform-specific wheels when available. Once a user has the platform wheel, hooks are Rust. There is no "auto" mode — the wheel either has the binaries or it doesn't.
CI/CD Pipeline: rust-build.yml
name: Build Rust Binaries + Platform Wheels
on:
push:
branches: [main]
paths: ['amplihack-rs/**']
release:
types: [published]
jobs:
build-rust:
strategy:
matrix:
include:
- target: x86_64-unknown-linux-gnu
os: ubuntu-latest
wheel_plat: manylinux_2_28_x86_64
- target: aarch64-unknown-linux-gnu
os: ubuntu-latest
wheel_plat: manylinux_2_28_aarch64
- target: x86_64-apple-darwin
os: macos-13
wheel_plat: macosx_13_0_x86_64
- target: aarch64-apple-darwin
os: macos-14
wheel_plat: macosx_14_0_arm64
steps:
- uses: actions/checkout@v4
- uses: dtolnay/rust-toolchain@stable
with:
targets: ${{ matrix.target }}
- name: Build + strip Rust binaries
run: |
cd amplihack-rs
cargo build --release --target ${{ matrix.target }}
strip target/${{ matrix.target }}/release/amplihack
strip target/${{ matrix.target }}/release/amplihack-hooks
- uses: actions/upload-artifact@v4
with:
name: rust-${{ matrix.target }}
path: |
amplihack-rs/target/${{ matrix.target }}/release/amplihack
amplihack-rs/target/${{ matrix.target }}/release/amplihack-hooks
build-platform-wheels:
needs: build-rust
strategy:
matrix:
include:
- target: x86_64-unknown-linux-gnu
wheel_plat: manylinux_2_28_x86_64
- target: aarch64-unknown-linux-gnu
wheel_plat: manylinux_2_28_aarch64
- target: x86_64-apple-darwin
wheel_plat: macosx_13_0_x86_64
- target: aarch64-apple-darwin
wheel_plat: macosx_14_0_arm64
steps:
- uses: actions/checkout@v4
- uses: actions/download-artifact@v4
with:
name: rust-${{ matrix.target }}
path: src/amplihack/bin/
- run: python -m build --wheel
build-pure-wheel:
steps:
- uses: actions/checkout@v4
- run: python -m build --wheelChanges to build_hooks.py
RUST_BIN_DIR = Path("src/amplihack/bin")
def _copy_rust_binaries(self):
"""Copy pre-built Rust binaries into package if available."""
if not RUST_BIN_DIR.exists():
return # Pure-Python wheel
for binary in RUST_BIN_DIR.iterdir():
if binary.is_file() and os.access(binary, os.X_OK):
dst = Path("src/amplihack/bin") / binary.name
shutil.copy2(str(binary), str(dst))
os.chmod(str(dst), 0o755)Add to pyproject.toml:
[tool.setuptools.package-data]
amplihack = ["bin/*"]Binary Discovery at Runtime
# src/amplihack/binary_discovery.py
def find_rust_binary(name: str) -> Path | None:
"""Discover Rust binary location, in priority order."""
candidates = [
os.environ.get(f"AMPLIHACK_{name.upper().replace('-', '_')}_BIN"), # 1. Env var override
Path.home() / ".amplihack" / ".claude" / "bin" / name, # 2. Staged (UVX)
Path(__file__).parent / "bin" / name, # 3. Package (wheel)
_find_cargo_build(name), # 4. Cargo build (dev)
Path.home() / ".cargo" / "bin" / name, # 5. Cargo install
shutil.which(name), # 6. System PATH
]
for c in candidates:
if c and Path(c).is_file() and os.access(c, os.X_OK):
return Path(c)
return NoneHook Path Migration
Current settings.json:
{"command": "/home/user/.amplihack/.claude/tools/amplihack/hooks/pre_tool_use.py"}After migration:
{"command": "/home/user/.amplihack/.claude/bin/amplihack-hooks pre-tool-use"}Migration extends settings.py update_hook_paths() with a HOOK_REGISTRY mapping. When Rust binaries are present in the wheel, amplihack install registers them. When they're not (pure-Python wheel), Python hooks remain registered. No auto-detection, no fallback — the install step writes the correct paths based on what's in the package.
UVX Staging
Add "bin" to ESSENTIAL_DIRS in cli.py so UVX staging copies Rust binaries to ~/.amplihack/.claude/bin/.
Version Synchronization
Single version number sourced from pyproject.toml, synced to Cargo.toml by CI (in auto-version-on-merge.yml).
User Upgrade Scenarios
| Scenario | What Happens | User Action |
|---|---|---|
uv tool upgrade amplihack |
UV downloads platform wheel with Rust binaries. Next session re-stages and rewrites hook paths. | None |
uvx (zero-install) |
Fresh temp venv from latest wheel. Staging copies binaries. | None |
git pull (editable install) |
Gets Cargo workspace. Must build Rust. Next amplihack install discovers binaries. |
cargo build --release |
| Unsupported platform | Pure-Python wheel installed. Hooks remain Python scripts. User is on the pre-migration path. | None |
| Rust binary crashes | Hook fails. Claude Code sees exit code != 0, stderr has error. Fix the bug. | Report bug, fix in next release |
macOS Code Signing
Start with ad-hoc signing (codesign --force --sign -) in Phase 2. Add full notarization ($99/yr Apple Developer account) when user complaints about Gatekeeper warrant the cost.
Explicitly unsupported: Native Windows. WSL works via linux-x64.
Agent Execution Protocol
⚠️ MANDATORY: Default Workflow via Recipe Runner for ALL Code Changes
Every code change MUST be executed through the dev-orchestrator skill: Skill(skill="dev-orchestrator"). This invokes the recipe runner which handles the 25-step default workflow automatically — task classification, workstream decomposition, implementation, testing, and review.
Do NOT just read DEFAULT_WORKFLOW.md and manually follow steps. Invoke the dev-orchestrator skill. It runs the workflow as a recipe. Work done outside the recipe runner WILL BE REJECTED — even if the code is correct.
The dev-orchestrator applies to: Rust code, Python test changes, CI configuration, build scripts, documentation updates tied to code changes. The only exception is editing this issue itself.
Autonomous Loop
The implementing agent MUST work in a continuous, autonomous loop until CERTAIN that all targeted functionality is replaced and equivalent. The loop:
while not CERTAIN_ALL_FUNCTIONALITY_REPLACED:
1. Pick the next untouched Python function/module from the target list
2. Read the Python implementation thoroughly — understand WHAT it does and WHY
3. Write a behavioral spec: inputs, outputs, edge cases, error conditions
4. Invoke Skill(skill="dev-orchestrator") to implement the Rust equivalent via recipe runner
5. Write tests that mirror every Python test + cover edge cases from step 3
6. Run `amplihack hooks compare` against golden files for the component
7. If mismatch → fix → go to step 6
8. If match → mark as done, commit, move to next
9. After completing a phase, run the FULL triple-check (see below)
Only stop and ask when:
- A design decision would be irreversible AND multiple valid approaches exist with materially different tradeoffs
- An external dependency is missing and cannot be installed
- A Python behavior appears to be a bug — ask whether to replicate the bug or fix it
Never stop to ask about:
- Code style choices (follow idiomatic Rust and the patterns in this issue)
- Test strategy (always test more, not less)
- Whether to proceed to the next item (always proceed)
Triple-Check Verification Protocol
Before declaring ANY phase complete, run all three checks. All three must pass.
Check 1: Functional Equivalence Matrix
For every public function/entry point being replaced, build an automated comparison matrix:
┌─────────────────────┬──────────────┬──────────────┬─────────┐
│ Function │ Python Output│ Rust Output │ Match? │
├─────────────────────┼──────────────┼──────────────┼─────────┤
│ pre_tool_use(bash) │ {"allow":..} │ {"allow":..} │ ✅ │
│ pre_tool_use(edit) │ {"deny":...} │ {"deny":...} │ ✅ │
│ stop(reflection) │ {"block":..} │ {"block":..} │ ✅ │
│ ...every function │ │ │ │
└─────────────────────┴──────────────┴──────────────┴─────────┘
Generate test fixtures from real session transcripts. Run BOTH implementations against every fixture. Diff outputs. The matrix must be 100% green before proceeding.
Check 2: Fuzz / Property Testing
For each module, write property-based tests (proptest in Rust, hypothesis in Python):
- Random valid JSON inputs → both must produce identical outputs
- Random invalid JSON inputs → both must fail gracefully (not crash)
- Oversized inputs → both must handle within timeout
- Minimum 10,000 random inputs per function
Check 3: Integration Smoke Test
Run the Rust binary in a REAL amplihack session:
- Start
amplihack claudewithAMPLIHACK_HOOK_ENGINE=rust - Perform at least 10 real operations (edit files, run bash, write code)
- Capture all hook stdin/stdout pairs
- Replay the same stdin against Python hooks
- Diff all outputs — must match
Side-by-Side Evaluation Framework
Build these tools so the owner can compare Rust and Python behavior offline, during development — NOT as a runtime mode.
⚠️ NO FALLBACKS. NO DUAL-ENGINE RUNTIME MODE. Comparison happens in tests, in CI, and via explicit CLI commands. Never at runtime. If Rust is the active engine, it's the ONLY engine. If it produces wrong output, that's a bug to fix, not a reason to fall back to Python.
1. amplihack hooks compare CLI Command (Development Tool)
# Compare a single hook event (offline — runs both implementations and diffs)
amplihack hooks compare --event PreToolUse --input fixture.json
# Compare all hooks against golden files
amplihack hooks compare --all
# Compare with timing (informational, not a correctness gate)
amplihack hooks compare --event PreToolUse --input fixture.json --timing
# Output:
# Python: 145ms {"permissionDecision": "allow"}
# Rust: 3ms {"permissionDecision": "allow"}
# Status: MATCH ✅This is a test/development tool, not a production mode. You run it explicitly, look at the output, fix discrepancies.
2. Golden File Test Suite
tests/golden/
hooks/
pre_tool_use/
bash_npm_test.input.json
bash_npm_test.expected.json
bash_rm_rf.input.json
bash_rm_rf.expected.json
...
stop/
normal_stop.input.json
normal_stop.expected.json
reflection_needed.input.json
reflection_needed.expected.json
Both Python and Rust must produce byte-identical output for every golden file. CI runs both and fails on any diff. Every bug found adds a new golden file as a regression test.
Dogfooding Protocol
The implementing agent MUST use the Rust version for its own development work. Non-negotiable.
| Phase | Requirement |
|---|---|
| Phase 1 (Hooks) | After first 3 hooks compile: switch to Rust hooks for ALL subsequent development. Fix bugs as they surface. |
| Phase 2 (CLI) | After CLI compiles: use amplihack-rs claude as the launcher. No Python CLI alongside. |
| Phase 2 (CLI) | After CLI compiles: use amplihack-rs claude as the launcher. No Python CLI alongside. |
Dogfooding log: Maintain a running table of issues discovered through dogfooding. Every dogfooding issue becomes a golden file test case.
If Rust hooks break during dogfooding: This is expected during development. The response is:
- Capture the exact input that caused the failure
- Add it as a golden file test case immediately
- Fix the Rust bug
- Verify the fix passes the new golden file
- Continue dogfooding
Do NOT switch back to Python hooks. Fix forward. If the bug is severe enough that you literally cannot continue working (e.g., every command is blocked), fix the bug first — that's the highest priority task.
CI Gates
jobs:
functional-equivalence:
steps:
- name: Golden file tests (Python)
run: python -m pytest tests/golden/ --engine=python
- name: Golden file tests (Rust)
run: python -m pytest tests/golden/ --engine=rust
- name: Diff outputs
run: python tests/golden/diff_outputs.py
property-tests:
steps:
- name: Rust property tests
run: cd amplihack-rs && cargo test --test proptest -- --test-threads=1
- name: Python property tests
run: python -m pytest tests/property/ -x
dogfooding-smoke:
steps:
- name: Real hook invocations with Rust
run: |
export AMPLIHACK_HOOK_ENGINE=rust
python tests/smoke/run_real_hooks.py --count 10
- name: Compare with Python baseline
run: python tests/smoke/compare_outputs.pyCI blocks merge if: any golden file test fails, property tests find divergence, Rust binary panics, or Rust output differs from Python on any fixture.
Work Decomposition & Delegation
Dependency Graph
Work must proceed in dependency order. The graph below shows what blocks what:
amplihack-common (shared types, JSON protocol, error types, file lock utils)
│
├── amplihack-hooks/pre_tool_use ──┐
├── amplihack-hooks/post_tool_use ─┤── Tier 1: Independent, parallelizable
├── amplihack-hooks/session_stop ──┘
│
├── amplihack-hooks/user_prompt_submit ─┐── Tier 2: Need common + memory bridge
├── amplihack-hooks/pre_compact ────────┘
│
├── amplihack-hooks/stop ───────────┐── Tier 3: Coordinate via shared state files
├── amplihack-hooks/session_start ──┘
│
└── amplihack-cli (depends on hooks being done)
Note: amplihack-xpia is a SEPARATE REPO/ISSUE — not part of this dependency graph.
What Must Be Sequential
| Step | Why Sequential | Blocks |
|---|---|---|
amplihack-common crate |
All hooks import shared types (HookInput, HookOutput, JsonProtocol, ErrorProtocol, FileLock, ShutdownContext) | Everything |
SDK bridge module (python_bridge.rs) |
4 hooks need subprocess-to-Python calls; define the pattern once | Tier 2 + Tier 3 hooks |
| Golden file infrastructure | Test harness must exist before hook implementation can be verified | All hook implementation |
What Can Run in Parallel
Tier 1 — 3 hooks, zero dependencies on each other:
| Hook | LOC | Shared State | SDK Bridge? | Notes |
|---|---|---|---|---|
pre_tool_use |
492 | Reads config files (read-only) | No | Largest hook. Pure input→decision. Start here — it exercises the most common types. |
post_tool_use |
203 | Reads config, writes metrics file | No | Smallest non-trivial hook. Good second target. |
session_stop |
86 | None | Yes (MemoryCoordinator.store) | Smallest hook. SDK bridge is the only complexity. |
These 3 hooks share NO state and have NO cross-dependencies. After amplihack-common is built, they can be implemented simultaneously by parallel sub-agents.
Tier 2 — 2 hooks, depend on Tier 1 patterns but not Tier 1 code:
| Hook | LOC | Shared State | SDK Bridge? | Notes |
|---|---|---|---|---|
user_prompt_submit |
451 | Reads preference files | Yes (inject_memory_for_agents_sync) | Memory injection. Can reuse bridge pattern from session_stop. |
pre_compact |
195 | Reads transcript files | No | Context preservation. Independent of user_prompt_submit. |
These depend on the SDK bridge pattern being established (from Tier 1's session_stop) but not on Tier 1 hooks being complete. They CAN run in parallel with each other.
Tier 3 — 2 hooks, MUST coordinate:
| Hook | LOC | Shared State | SDK Bridge? | Notes |
|---|---|---|---|---|
stop |
902 | Writes lock files, counter files, continuation prompt files | Yes (reflection) | Largest, most complex hook. Lock mode + power steering state machine. |
session_start |
601 | Reads lock files written by stop, writes session state | Yes (memory context) | Coordinates with stop via shared file locks. |
These two share state through the filesystem (lock files, counter files). They MUST be designed together — the file format and locking protocol must be agreed before either is implemented. Implement stop first (it defines the state), then session_start (it reads that state).
Sub-Agent Delegation Guide
The implementing agent should use sub-agents strategically:
| Task Type | Agent Type | When to Use |
|---|---|---|
| Understanding Python code | explore |
Before implementing each function — understand what the Python does, what edge cases it handles, what state it reads/writes |
| Implementing a Rust function | Do it yourself | The main agent should write Rust code directly — it has the full context of the issue spec, patterns, and prior decisions |
| Writing golden file test inputs | task |
Generating synthetic JSON inputs for a hook — mechanical work that benefits from parallelism |
| Running tests / building | task |
cargo test, cargo clippy, golden file comparison — fire and forget, read results |
| Reviewing a completed hook | code-review |
After implementing a hook, have a code-review agent check for correctness issues, missing edge cases, silent error swallowing |
| Investigating a Python bug | explore |
When a golden file reveals Python behavior that looks wrong — investigate before deciding whether to replicate or fix |
Parallel sub-agent opportunities:
- Launch 3
exploreagents in parallel to analyze Tier 1 hooks before implementing - Launch
taskagents to runcargo testwhile continuing to write code - Launch
code-reviewon hook N while implementing hook N+1
Never delegate:
- Architectural decisions (crate structure, type design, error strategy) — main agent only
- SDK bridge contract design — must be consistent across all 4 uses
- State file format decisions for stop/session_start coordination
Recommended Execution Order
Phase 0:
0a. Type analysis (explore Python hooks) ─────────────────────┐
0b. Build amplihack-common crate ──────────────────────────────┤ Sequential
0c. Build golden file infrastructure ──────────────────────────┘
0d. Capture golden files for ALL hooks ─── (can parallel: 1 task agent per hook)
Phase 1 — Tier 1 (parallel):
┌── pre_tool_use implementation + tests ──┐
├── post_tool_use implementation + tests ─┤ Parallel (independent)
└── session_stop implementation + tests ──┘
Then: Triple-check Tier 1
Phase 1 — Tier 2 (parallel):
┌── user_prompt_submit implementation ────┐
└── pre_compact implementation ───────────┘ Parallel (independent)
Then: Triple-check Tier 2
Phase 1 — Tier 3 (sequential):
stop implementation ──→ session_start implementation
(stop defines shared state format; session_start reads it)
Then: Triple-check Tier 3
Then: Triple-check ALL hooks together
Phase 2 (after Phase 1 — CONDITIONAL):
amplihack-cli ── Depends on hooks being done
Checkpoint Strategy
After completing each tier, create a checkpoint:
- All golden files pass for the tier
cargo clippy -D warningscleanamplihack hooks compareshows 0 mismatches for the tier- Code-review agent finds no correctness issues
- Commit, push, create PR
Do NOT move to the next tier until the current tier's PR is clean. Accumulating untested code across tiers creates integration nightmares.
Branch & PR Strategy
Rule: Never push to main. Always feature branches + PRs.
| Scope | Branch Pattern | PR Size Target |
|---|---|---|
| Phase 0 scaffolding | amplihack-rs/phase-0-scaffold |
1 PR |
| Phase 0b golden files | amplihack-rs/golden-files |
1 PR |
| Each hook | amplihack-rs/hook-{name} (e.g., hook-pre-tool-use) |
1 PR per hook |
| XPIA crate | Tracked separately in amplihack-xpia-defender repo |
Separate issue |
| CLI (if Phase 2) | amplihack-rs/cli-{component} |
2-3 PRs |
| Infrastructure (CI, build) | amplihack-rs/ci-{what} |
Small PRs |
PR requirements:
- Golden file parity tests pass
cargo clippy -D warningscleancargo fmt --checkclean- Property tests pass (if applicable to the component)
amplihack hooks compareresults included in PR description- PR description lists specific correctness improvements (e.g., "eliminates TOCTOU race in counter update")
- dev-orchestrator invoked for all changes (evidence: structured commit messages, test coverage)
Commit Cadence
- Commit after every completed function. Don't accumulate large uncommitted changesets.
- Push to branch after every 3-5 commits. Ensures work is saved remotely.
- Create PR after completing a logical unit (one hook, one crate, one component). Don't batch too many components into one PR.
- If stuck for >2 hours on one function: commit what works, write a
TODO(blocked)comment with the issue, move to the next function. Come back after finishing the rest of the component — fresh eyes often solve the problem.
Recovery Strategy
- If a component is harder than expected: Don't grind indefinitely. After implementing and testing the straightforward functions, if the remaining ones are blocked, document what's blocking them and move to the next component. Return with more context later.
- If golden file tests reveal Python bugs: Fix the Python bug first (in a separate PR), re-capture golden files, then continue with Rust. Don't try to replicate known-wrong behavior.
- If a Rust pattern from this issue doesn't work as described: Document why, implement the correct pattern, and update this issue with what you learned. The patterns here were reviewed but not all were compiled — real-world adjustments are expected.
- If CI is red and you can't fix it quickly: Revert your last change, ensure CI is green, then try again with a different approach. Never leave CI red for more than 1 commit.
Parity Maintenance Policy
During the transition period (Phases 1-4), both Python and Rust implementations exist.
Rule: Python is the source of truth until a hook is fully validated in Rust.
| Scenario | Action |
|---|---|
| Bug found in Python hook | Fix in Python. Add golden file test. Port fix to Rust. Verify parity. |
| Bug found in Rust hook | Fix in Rust. Add golden file test. Check if Python has same bug — fix if so. |
| New feature added to Python hook | Implement in Python first. Add golden file test. Port to Rust before next release. |
| New Claude Code hook event added | Add #[serde(other)] Unknown handles it. Then implement in Python, then Rust. |
| Python hook diverges from Rust (CI catches) | Block merge. Fix whichever side is wrong. |
CI enforcement: The parity test harness runs on every PR. If Python and Rust produce different output for any golden file, the PR cannot merge.
Acceptance Criteria (Definition of "CERTAIN")
A phase is CERTAIN complete when ALL of the following are true:
Per-Function Criteria
- Rust function exists with equivalent signature
- Unit tests cover all paths the Python tests cover
- Golden file tests pass (100% — no exceptions)
- Property tests pass (10,000+ random inputs, zero divergence)
- Edge cases: empty input, malformed JSON, oversized input, unicode, null bytes
Per-Hook Criteria (on top of per-function)
- Multicall binary dispatches correctly to this hook
- Exit code contract matches (0, 2, other)
- Stderr output matches Python format
- All type-safety bugs from Python analysis proven eliminated by Rust's type system (exhaustive enums, no stringly-typed dispatch, no TOCTOU)
- Fail-open behavior: panic →
b"{}"on stdout, exit 0
Per-Phase Criteria (on top of per-hook)
- All hooks in the phase pass all per-hook criteria
- Integration smoke test: 10+ real operations with
AMPLIHACK_HOOK_ENGINE=rust - Dual-engine mode (
=dual) run for ≥1 full session with 0 mismatches - Dogfooding: agent used Rust for ≥3 sessions of real development work
- No regressions in existing Python test suite
- PR reviewed and merged
- CI pipeline green on all 4 platforms
Project-Complete Criteria (all phases)
- Every function in the inventory table above has status "done"
- Every golden file passes on both engines
- Dual-engine mode shows 0 mismatches across 5 consecutive sessions
- Performance baselines table filled with actual measurements showing improvement
-
amplihack hooks compare --transcriptshows 100% match on 3 real transcripts - Distribution: platform wheel builds succeed for all 4 targets
- Distribution: UVX install flow tested end-to-end
- README/docs updated with Rust binary information
Risks
| Risk | Prob | Impact | Mitigation |
|---|---|---|---|
| Claude Code hook protocol changes | Med | High | #[serde(other)] Unknown + #[serde(default)] everywhere. Fuzz with extra fields. |
| shell-words inadequate for edge cases | Med | High | Property-test against /bin/sh. Custom post-processor for &&/` |
| Two build systems (Cargo + setuptools) | High | Med | Design CI FIRST (Phase 0d). Use cibuildwheel for wheels. |
| Contributor friction (Rust toolchain) | High | Med | CONTRIBUTING_RUST.md. Rust basics gate in Phase 0. |
| macOS code signing / Gatekeeper | Med | Med | cosign + notarization in CI from Phase 2. |
| Rust hook crashes in production | Med | High | Catch_unwind returns b"{}" (fail-open for non-security hooks). Fix the bug immediately. No silent fallback to Python. |
| Python hooks diverge during transition | Med | Med | Parity test harness in CI blocks merge on diff. |
Timeline Summary
| Phase | Duration | Cumulative | Deliverable |
|---|---|---|---|
| 0: Foundation | 3 weeks | 3 weeks | Golden files, workspace, Python experiment |
| 1: All Hooks | 5-6 weeks | 8-9 weeks | All hooks in Rust (3 tiers) |
| 2: CLI launcher | 3 weeks | 11-12 weeks | Rust CLI binary (conditional) |
XPIA is tracked in a separate issue/repo. Not included in this timeline.
Each phase ships independently. Decision gates after Phase 0a (CLI go/no-go) and Phase 1 (continue or stop). The project delivers value even if only Phase 1 ships.
Open Questions
- Phase 2 go/no-go: Run the Python experiment first. Does
mypy --strictcatch the correctness issues we care about? If so, the CLI phase may not be worth it. - Hook protocol documentation: Should we ask Anthropic for a formal hook JSON schema, or accept the risk of undocumented changes?
- Nesting detection strategy: Env var marker (simple) vs session file with PID (existing, fragile)?
Execution Recipes
These recipes are the executable specification for the entire project. Each recipe defines the exact steps, inputs, outputs, and success criteria. The implementing agent should load and execute these recipes — they ARE the plan, not a summary of it.
Per-Component Recipe: rust-rewrite-component.yaml
This is the inner loop. It runs once for each Python function/module being rewritten. Every function in the inventory table above gets processed through this recipe. The recipe does not complete until the Rust code is provably correct — not just compiling, but producing identical behavior to the Python on every input.
name: "rust-rewrite-component"
description: "Rewrite one Python component to Rust with verified behavioral equivalence"
version: "1.0.0"
tags: ["amplihack-rs", "rewrite", "correctness"]
context:
python_file: "" # e.g. ".claude/tools/amplihack/hooks/pre_tool_use.py"
python_function: "" # e.g. "PreToolUseHook._check_cwd_deletion"
rust_target_file: "" # e.g. "amplihack-rs/crates/amplihack-hooks/src/pre_tool_use/cwd.rs"
rust_crate: "" # e.g. "amplihack-hooks"
golden_file_dir: "" # e.g. "tests/golden/hooks/pre_tool_use/cwd_deletion"
has_sdk_calls: "false" # "true" if function calls LLM/memory APIs
analysis: ""
golden_status: ""
implementation: ""
tests_written: ""
build_result: ""
parity_result: ""
property_result: ""
verification: ""
steps:
# ── STEP 1: Deep analysis of the Python implementation ────────────
# Goal: Understand EVERY code path, not just the happy path.
# The analysis becomes the correctness specification for the Rust code.
- id: "analyze-python"
type: agent
agent: "amplihack:builder"
prompt: |
Read `{{python_file}}`, focusing on `{{python_function}}`.
Produce a COMPLETE behavioral specification as JSON:
{
"function_signature": "name(params) -> return_type",
"lines": "start-end",
"loc": <number>,
"code_paths": [
{
"name": "happy path - safe bash command",
"trigger": "command does not match any dangerous pattern",
"returns": "{} (empty dict = allow)",
"side_effects": []
},
{
"name": "block - CWD deletion detected",
"trigger": "command contains 'rm -rf' targeting CWD",
"returns": "{'hookSpecificOutput': {'permissionDecision': 'deny', ...}}",
"side_effects": ["log to hook log file", "save metric"]
}
// ... EVERY path, including error/exception paths
],
"input_schema": {
"required_fields": ["tool_name", "tool_input"],
"tool_input_fields": {"command": "string"},
"optional_fields": ["session_id", "cwd"]
},
"output_schema": {
"on_allow": {},
"on_deny": {"hookSpecificOutput": {"permissionDecision": "deny", "permissionDecisionReason": "..."}},
"on_error": {}
},
"constants_and_regexes": [
{"name": "_RM_RECURSIVE_RE", "value": "...", "purpose": "detect rm -rf/rm -r"}
],
"side_effects": [
"reads CWD via os.getcwd()",
"resolves symlinks via Path.resolve()",
"writes to log file via self.log()"
],
"sdk_calls": [], // or ["inject_memory_for_agents_sync()"] if boundary function
"edge_cases": [
"command with quoted paths containing spaces",
"glob patterns that expand to CWD",
"symlinks pointing to CWD",
"empty command string",
"command with unicode characters",
"very long command (>1MB)"
],
"known_bugs_in_python": [
// List any bugs found during analysis — these should be FIXED in Rust
// e.g. "shlex.split() crashes on unmatched quotes — Python catches with bare except"
]
}
Be exhaustive. Missing a code path here means the Rust version will have a gap.
output: "analysis"
parse_json: true
# ── STEP 2: Ensure golden files exist ─────────────────────────────
# Golden files are the behavioral oracle. Without them, we can't verify correctness.
- id: "check-golden-files"
type: bash
command: |
if [ -d "{{golden_file_dir}}" ]; then
count=$(ls {{golden_file_dir}}/*.input.json 2>/dev/null | wc -l)
echo "GOLDEN_FILES_EXIST: $count files"
else
echo "GOLDEN_FILES_MISSING"
fi
output: "golden_status"
# ── STEP 3: Generate golden files if missing ──────────────────────
# Each golden file = one concrete input/output pair from the Python implementation.
# These capture actual behavior including quirks and edge cases.
- id: "capture-golden-files"
type: agent
condition: "'MISSING' in golden_status or 'EXIST: 0' in golden_status"
agent: "amplihack:builder"
prompt: |
Generate golden file test cases for `{{python_function}}`.
Based on the analysis:
{{analysis}}
For EVERY code path in the analysis, create at least one golden file.
For security-critical paths, create multiple (normal case + edge cases).
Each golden file is a pair:
{name}.input.json — the exact JSON that would arrive on stdin
{name}.expected.json — the exact JSON the Python produces on stdout
Generate the input, then ACTUALLY RUN the Python implementation to capture
the output. Do NOT guess what the output should be — run it and record.
```bash
echo '{"tool_name":"Bash","tool_input":{"command":"rm -rf /"}}' | python {{python_file}}
```
Required golden files (minimum):
1. One per code path from the analysis (happy + every error/block path)
2. Empty input: `{}`
3. Malformed input: `{"not_a_real_field": true}`
4. Missing required fields: `{"tool_name": "Bash"}` (no tool_input)
5. Oversized input: command string >10KB
6. Unicode in command: `{"tool_input": {"command": "echo '日本語'"}}`
7. Null bytes: `{"tool_input": {"command": "echo '\x00'"}}`
8. Nested quotes: `{"tool_input": {"command": "bash -c \"rm -rf '$(pwd)'\""}}`
Save all files to {{golden_file_dir}}/
If you discover a Python bug while capturing (crash, incorrect output),
document it in a file called {{golden_file_dir}}/BUGS.md and create the
golden file with the CORRECTED expected output (what Rust should do).
output: "golden_capture_result"
# ── STEP 4: Write the Rust implementation ─────────────────────────
# This is where correctness matters most. Every decision should be
# justified by the analysis, not by intuition.
- id: "write-rust"
type: agent
agent: "amplihack:builder"
prompt: |
Write the Rust implementation of `{{python_function}}` in `{{rust_target_file}}`
within the `{{rust_crate}}` crate.
Behavioral specification:
{{analysis}}
CORRECTNESS REQUIREMENTS (these are non-negotiable):
1. Handle EVERY code path from the analysis. No path may be skipped.
2. Match Python output exactly for every golden file in {{golden_file_dir}}/
3. Use Rust's type system to make illegal states unrepresentable:
- Exhaustive enums instead of stringly-typed dispatch
- newtype wrappers for paths, session IDs, etc. where confusion is possible
- Option<T> instead of sentinel values
4. Eliminate the Python correctness gaps the analysis identified:
- TOCTOU races → use atomic file ops (O_CREAT|O_EXCL, temp+rename)
- Bare except → typed error handling with thiserror/anyhow
- String-based dispatch → exhaustive match on enums
- Missing validation → parse, don't validate (serde + newtype)
5. Use #[serde(other)] Unknown for forward-compatible deserialization
6. No unwrap() except in tests. No panic!() in library code.
7. If the function makes SDK/LLM calls (has_sdk_calls={{has_sdk_calls}}),
implement as a subprocess call to an embedded Python bridge script.
STYLE:
- Follow idiomatic Rust (clippy -D warnings must pass)
- Doc comments on public items explaining WHAT and WHY, not HOW
- Inline comments only for non-obvious correctness reasoning
- Prefer small functions with clear names over large ones with comments
Write the complete implementation. No TODOs. No placeholders.
output: "implementation"
# ── STEP 5: Write comprehensive tests ─────────────────────────────
# Tests are the proof that the Rust code is correct.
# They should make it impossible to regress.
- id: "write-tests"
type: agent
agent: "amplihack:builder"
prompt: |
Write tests for `{{rust_target_file}}` in the `{{rust_crate}}` crate.
Behavioral specification: {{analysis}}
Implementation: {{implementation}}
THREE categories of tests are required:
**Category 1: Golden file tests (MANDATORY)**
Load every .input.json from {{golden_file_dir}}/
Run it through the Rust implementation.
Compare output to .expected.json using semantic JSON comparison
(ignore key ordering and whitespace, but match values exactly).
```rust
#[test]
fn golden_files() {
for entry in fs::read_dir("{{golden_file_dir}}").unwrap() {
let path = entry.unwrap().path();
if path.extension() == Some("input.json".as_ref()) {
let input = fs::read_to_string(&path).unwrap();
let expected_path = path.with_extension("expected.json");
let expected = fs::read_to_string(&expected_path).unwrap();
let actual = run_hook_with_input(&input);
assert_json_eq!(actual, expected, "Golden file mismatch: {}", path.display());
}
}
}
```
**Category 2: Unit tests for each code path**
One test per code path from the analysis. Test name should describe
the scenario, not the implementation detail.
**Category 3: Property tests (proptest)**
- Random valid JSON inputs → must not panic, must return valid JSON
- Random invalid JSON → must not panic, must return {} or valid error
- Random strings in command field → must not panic
- Minimum: #[proptest(cases = 10000)]
Also test:
- Panic recovery: force a panic and verify catch_unwind returns b"{}"
- SIGPIPE: verify graceful handling when stdout is closed
- Concurrent access: if function touches files, run 10 threads simultaneously
output: "tests_written"
# ── STEP 6: Build and run all tests ───────────────────────────────
- id: "build-and-test"
type: bash
command: |
cd amplihack-rs && \
cargo clippy -D warnings 2>&1 && \
cargo fmt --check 2>&1 && \
cargo test -p {{rust_crate}} -- --test-threads=1 2>&1
output: "build_result"
# ── STEP 7: Fix any build/test failures ───────────────────────────
# Do not move on with broken code. Fix everything.
- id: "fix-build-errors"
type: agent
condition: "'error' in build_result or 'FAILED' in build_result"
agent: "amplihack:builder"
prompt: |
Build or tests failed. Fix ALL errors:
{{build_result}}
After fixing, re-run the build and tests. Do not proceed until clean.
If a test failure reveals a real behavior difference from Python,
fix the Rust implementation — do NOT weaken the test.
output: "fix_result"
# ── STEP 8: Run golden file parity check ──────────────────────────
# This is the critical correctness gate. Python and Rust must produce
# identical output for every golden file.
- id: "run-golden-parity"
type: bash
command: |
cd amplihack-rs && \
cargo build --release -p {{rust_crate}} 2>&1 && \
python tests/golden/run_parity.py \
--golden-dir {{golden_file_dir}} \
--rust-binary target/release/amplihack-hooks \
--python-script {{python_file}} 2>&1
output: "parity_result"
# ── STEP 9: Fix parity mismatches ─────────────────────────────────
# The golden files are the source of truth (unless documented as Python bugs).
- id: "fix-parity-failures"
type: agent
condition: "'MISMATCH' in parity_result or 'FAIL' in parity_result"
agent: "amplihack:builder"
prompt: |
Golden file parity check found mismatches:
{{parity_result}}
For each mismatch:
1. Check if this is a documented Python bug (see {{golden_file_dir}}/BUGS.md)
- If yes: verify Rust output matches the CORRECTED expected output
- If no: the Rust implementation is wrong. Fix it to match Python.
2. Do NOT change golden files to match Rust. Fix the Rust code.
3. Re-run parity check after each fix.
4. Do not proceed until 100% parity.
output: "parity_fix_result"
# ── STEP 10: Run property tests ───────────────────────────────────
- id: "run-property-tests"
type: bash
command: |
cd amplihack-rs && \
cargo test -p {{rust_crate}} --test proptest -- --test-threads=1 2>&1
output: "property_result"
# ── STEP 11: Correctness verification ─────────────────────────────
# A reviewer agent checks the work. This is the third pair of eyes.
- id: "verify-correctness"
type: agent
agent: "amplihack:reviewer"
prompt: |
Verify this component rewrite is CORRECT and COMPLETE.
Python function: {{python_function}} in {{python_file}}
Rust target: {{rust_target_file}} in {{rust_crate}}
Behavioral specification: {{analysis}}
Build result: {{build_result}}
Parity result: {{parity_result}}
Property test result: {{property_result}}
Verify:
1. Every code path in the behavioral spec has a corresponding Rust code path
2. Every code path has at least one test
3. All golden file parity tests pass
4. All property tests pass
5. No TODO/FIXME/HACK comments remain
6. No unwrap() outside of tests
7. Error handling covers all cases (no bare catch-all)
8. Type safety: are there any places where stringly-typed values
could be replaced with enums or newtypes?
9. Concurrency safety: are file operations atomic where they need to be?
10. Forward compatibility: does deserialization handle unknown fields/variants?
Output JSON:
{"status": "CORRECT"} — if ALL checks pass
{"status": "INCORRECT", "issues": ["specific issue 1", "specific issue 2"]}
output: "verification"
parse_json: truePhase Recipe: rust-rewrite-phase.yaml
This recipe orchestrates a complete phase. It processes all components in the phase through the per-component recipe, then runs the triple-check verification protocol. A phase does not complete until triple-check passes.
name: "rust-rewrite-phase"
description: "Execute a full phase: rewrite all components, verify correctness, enable dogfooding"
version: "1.0.0"
tags: ["amplihack-rs", "phase", "correctness"]
context:
phase_name: "" # e.g. "pre-tool-use"
phase_number: "" # e.g. "1"
components: "" # Multi-line: python_file → rust_target mapping
baselines: ""
rewrite_result: ""
check1_result: ""
check2_result: ""
check3_result: ""
dogfooding_status: ""
pr_result: ""
steps:
# ── STEP 1: Capture behavioral baselines ──────────────────────────
# Record what the Python does so we can prove Rust does the same thing.
- id: "capture-behavioral-baselines"
type: agent
agent: "amplihack:builder"
prompt: |
Capture behavioral baselines for all components in phase {{phase_name}}.
Components: {{components}}
For each component:
1. Run the Python implementation with 50+ representative inputs
2. Record FULL input/output pairs (not just timing)
3. Record exit codes and stderr output
4. Save to tests/baselines/phase-{{phase_number}}/
These baselines serve as the correctness oracle.
Also record timing as informational data (not a success criterion).
output: "baselines"
# ── STEP 2: Rewrite all components ────────────────────────────────
# Execute the per-component recipe for each item in the component list.
# Do not stop between components. Work autonomously through the list.
- id: "rewrite-all-components"
type: agent
agent: "amplihack:builder"
prompt: |
For EVERY component listed below, execute the full rewrite workflow:
analyze → golden files → implement → test → parity check → property test → verify.
Components to rewrite:
{{components}}
Work through them sequentially. After each component's golden file parity
test passes at 100%, move to the next. Do not stop to ask between components.
Track progress:
- Log each component's status (DONE / IN_PROGRESS / BLOCKED)
- If a component is BLOCKED, note why and move to the next unblocked one
- Return to BLOCKED components after finishing the unblocked ones
Do not declare this step complete until EVERY component is DONE.
output: "rewrite_result"
# ── TRIPLE CHECK 1: Functional Equivalence Matrix ─────────────────
# Build the matrix showing Python vs Rust output for every golden file.
- id: "triple-check-1-equivalence"
type: agent
agent: "amplihack:reviewer"
prompt: |
Build and execute the functional equivalence matrix for phase {{phase_name}}.
For EVERY function that was rewritten in this phase:
1. Collect all golden files for that function
2. Run each golden file input through the Python implementation
3. Run each golden file input through the Rust implementation
4. Compare outputs using semantic JSON comparison
5. Record: function name, input file, Python output, Rust output, MATCH/MISMATCH
Build the matrix as a table:
| Function | Golden File | Python Output Hash | Rust Output Hash | Match? |
|----------|-------------|-------------------|-----------------|--------|
| ... | ... | ... | ... | ✅ or ❌ |
The matrix MUST be 100% ✅. Report any ❌ with full diff.
If any mismatch: this check FAILS. Do not proceed.
output: "check1_result"
# ── TRIPLE CHECK 2: Fuzz / Property Testing ───────────────────────
# Random inputs catch edge cases that golden files miss.
- id: "triple-check-2-property-tests"
type: bash
command: |
cd amplihack-rs && \
cargo test --test proptest -- --test-threads=1 2>&1 && \
echo "--- Rust property tests passed ---" && \
cd .. && \
python -m pytest tests/property/ -x -v 2>&1 && \
echo "--- Python property tests passed ---"
output: "check2_result"
# ── TRIPLE CHECK 3: Live Integration Smoke Test ───────────────────
# Use the Rust hooks in a real session. This catches issues that
# unit tests and golden files miss: timing, pipe handling, encoding, etc.
- id: "triple-check-3-live-integration"
type: agent
agent: "amplihack:builder"
prompt: |
Run the Rust hooks in a REAL amplihack session.
1. Set AMPLIHACK_HOOK_ENGINE=rust
2. Start an amplihack session
3. Perform at least 10 DIVERSE real operations:
- Edit a file (triggers PostToolUse)
- Run a bash command (triggers PreToolUse)
- Run a dangerous-looking command that should be blocked (triggers PreToolUse deny)
- Write a new file (triggers PostToolUse)
- Do a grep/glob (triggers PreToolUse)
- Let the session end naturally (triggers Stop)
4. Capture ALL hook stdin/stdout pairs from the session
5. Replay the captured stdin against the PYTHON hooks
6. Diff every output pair
Report: total hooks fired, matches, mismatches.
If ANY mismatch: this check FAILS. Report the full diff.
Zero mismatches required to proceed.
output: "check3_result"
# ── STEP 6: Enable dogfooding ─────────────────────────────────────
# From this point on, the agent uses Rust hooks for all its own work.
- id: "enable-dogfooding"
type: agent
condition: "'100%' in check1_result or 'PASS' in check1_result"
agent: "amplihack:builder"
prompt: |
All triple-checks passed for phase {{phase_name}}.
1. Set AMPLIHACK_HOOK_ENGINE=rust in your environment
2. For ALL remaining work on this project, you are now running Rust hooks
3. If you encounter any issue, log it with:
- Exact input that caused the problem
- Expected output (what Python does)
- Actual Rust output
- Stack trace or error message
4. Fix the issue immediately, add a golden file test, and continue
Do NOT switch back to Python unless you are completely blocked.
If you must switch back, fix the Rust issue within one session and switch back.
output: "dogfooding_status"
# ── STEP 7: Create PR ────────────────────────────────────────────
- id: "create-pr"
type: agent
agent: "amplihack:builder"
prompt: |
Create a PR for phase {{phase_name}} (phase {{phase_number}}).
Branch: amplihack-rs/phase-{{phase_number}}-{{phase_name}}
PR description MUST include:
1. List of all Python functions replaced and their Rust equivalents
2. Equivalence matrix results (100% match)
3. Property test results (10K+ inputs, zero panics)
4. Live integration test results (zero mismatches)
5. Correctness improvements over Python:
- Which type-safety bugs are now eliminated?
- Which TOCTOU races are now impossible?
- Which error-handling gaps are now covered?
6. Any Python bugs discovered and fixed during the rewrite
7. Dogfooding session count and issues found/fixed
output: "pr_result"Master Recipe: amplihack-rs-master.yaml
This is the top-level execution specification. It orchestrates the entire project from start to finish. The implementing agent loads this recipe and executes it — this IS the autonomous loop.
name: "amplihack-rs-master"
description: "Complete Rust rewrite: autonomous execution loop until CERTAIN all functionality is correct"
version: "1.0.0"
tags: ["amplihack-rs", "master", "correctness"]
context:
phase_0_result: ""
phase_2_gate: ""
phase_1_result: ""
phase_2_result: ""
final_verification: ""
steps:
# ═══════════════════════════════════════════════════════════════════
# PHASE 0: FOUNDATION — build the infrastructure for verified rewrites
# ═══════════════════════════════════════════════════════════════════
# ── 0a: Python experiment (decision gate for Phase 2) ─────────────
- id: "phase-0a-python-experiment"
type: agent
agent: "amplihack:builder"
prompt: |
Run the Python typing experiment (Phase 0a).
This determines whether the CLI rewrite (Phase 2) is worth doing.
1. Run `mypy --strict` on:
- .claude/tools/amplihack/hooks/*.py
- src/amplihack/cli.py
- src/amplihack/settings.py
2. Categorize every error:
- TYPE_BUG: mypy found a real bug (e.g., wrong type passed, missing null check)
- ANNOTATION_GAP: code is correct but lacks type annotations
- FALSE_POSITIVE: mypy is wrong
3. Count TYPE_BUGs. These are the correctness issues Rust would eliminate.
4. Add lazy imports to cli.py entry point (measure baseline first)
Decision:
- If TYPE_BUGs in CLI ≥ 5: output "PHASE2_GO: {count} real type bugs in CLI"
- If TYPE_BUGs in CLI < 5: output "PHASE2_SKIP: only {count} type bugs, mypy is sufficient"
Also output the full categorized error list for reference.
output: "phase_2_gate"
# ── 0b: Golden file capture ──────────────────────────────────────
- id: "phase-0b-golden-files"
type: agent
agent: "amplihack:builder"
prompt: |
Capture behavioral golden files for ALL hooks (Phase 0b).
For each hook, instrument it to log every stdin/stdout pair, then
run real amplihack sessions to generate traffic. You may also
craft synthetic inputs to cover edge cases.
MINIMUM golden file counts:
- pre_tool_use: 200 (most critical — security decisions)
- stop: 100 (lock mode, power steering, reflection)
- session_start: 50 (version check, migration)
- post_tool_use: 100 (tool metrics, validation)
- user_prompt_submit: 100 (preference injection, memory)
- pre_compact: 50 (transcript export)
- session_stop: 30 (memory capture)
For each golden file pair:
{name}.input.json — exact stdin JSON
{name}.expected.json — exact stdout JSON
{name}.meta.json — exit code, stderr, timing (informational)
ALSO: review each golden file for Python bugs.
If the Python output looks wrong:
1. Document the bug in tests/golden/PYTHON_BUGS.md
2. Create the golden file with the CORRECTED expected output
3. Note: "Rust should fix this. Python bug: [description]"
Build the semantic JSON comparison tool:
tests/golden/json_compare.py
Rules: ignore key ordering, ignore whitespace, match values exactly,
special handling for floating point (epsilon comparison).
Save everything to tests/golden/hooks/{hook_name}/
output: "golden_files_result"
# ── 0c: Python test coverage uplift ──────────────────────────────
- id: "phase-0c-test-coverage"
type: agent
agent: "amplihack:builder"
prompt: |
Uplift Python hook test coverage (Phase 0c).
Current coverage is ~12%. Target: 60% overall, 80% on security-critical paths.
Priority order:
1. pre_tool_use: CWD deletion, CWD rename, main branch protection
2. stop: lock mode safety valve, power steering counter
3. hook_processor: read_input, write_output, path validation
4. Error handling: every `except Exception` block needs a test
that triggers that exception
Run: pytest --cov=.claude/tools/amplihack/hooks/ --cov-report=term-missing
The coverage uplift serves two purposes:
- Finds Python bugs BEFORE we start writing Rust (fix them now)
- Creates Python tests that we mirror in Rust (cross-reference)
output: "coverage_result"
# ── 0d: Workspace scaffolding ────────────────────────────────────
- id: "phase-0d-scaffold"
type: agent
agent: "amplihack:builder"
prompt: |
Create the Cargo workspace and verification infrastructure (Phase 0d).
1. Create amplihack-rs/ with 5-crate workspace:
amplihack-types, amplihack-state, amplihack-hooks,
amplihack-cli, bins/
2. Cargo.toml workspace config:
- resolver = "2"
- [profile.release] lto = true, strip = true, panic = "unwind"
- Common dependencies in workspace [dependencies]
3. CI pipeline (.github/workflows/rust-ci.yml):
- cargo clippy -D warnings
- cargo fmt --check
- cargo test
- Cross-compilation for 4 platforms
- Golden file parity tests (run both Python and Rust, diff outputs)
4. Golden file parity test harness:
- tests/golden/run_parity.py — runs both engines, compares
- tests/golden/json_compare.py — semantic JSON comparison
- Fails CI on ANY mismatch
5. IPC version protocol:
Every JSON message includes {"version": 1, ...}
Rust deserializer accepts version 1, warns on unknown versions
6. CONTRIBUTING_RUST.md:
- Error handling: thiserror for libs, anyhow for bins
- Deserialization: #[serde(other)] Unknown for forward compat
- File ops: atomic (temp+rename), locks with timeout
- No unwrap() except tests, no panic!() in libs
- How to run golden file parity tests locally
Create PR: amplihack-rs/phase-0-scaffold
output: "phase_0_result"
# ═══════════════════════════════════════════════════════════════════
# PHASE 1: PRE_TOOL_USE HOOK — highest correctness value
# This is the security-critical hook. Type safety matters most here.
# ═══════════════════════════════════════════════════════════════════
- id: "phase-1-hooks"
type: recipe
recipe: "rust-rewrite-phase"
sub_context:
phase_name: "pre-tool-use"
phase_number: "1"
components: |
hook_processor.py → protocol.rs
HookProcessor base class → Hook trait + run_hook() framework
read_input() → read JSON from stdin with size limit
write_output() → write JSON to stdout with SIGPIPE handling
validate_path_containment() → Path::canonicalize() + starts_with()
run() → run_hook() with catch_unwind panic handler
log() → structured logging to file with rotation
save_metric() → JSONL append
pre_tool_use.py → pre_tool_use/
process() → main PreToolUse handler
_check_cwd_deletion() → cwd.rs: detect rm/rmdir targeting CWD
_check_cwd_rename() → cwd.rs: detect mv targeting CWD
_extract_rm_paths() → command_parser.rs: shlex + path extraction
_extract_mv_source_paths() → command_parser.rs: mv argument parsing
_select_strategy() → strategy.rs: launcher detection enum
amplihack-types crate:
HookInput — serde tagged enum with #[serde(other)] Unknown
HookOutput — allow/deny/block/ask decision types
ToolDecision — exhaustive enum (not strings)
FailurePolicy — Open/Closed enum
amplihack-state crate:
AtomicJsonFile<T> — read-modify-write with temp+rename
FileLock — F_SETLK with timeout + retry
AtomicFlag — O_CREAT|O_EXCL semaphore
EnvVar<T> — typed env var parsing with defaults
PythonBridge — spawn_python() with timeout + embedded scripts
output: "phase_1_result"
# ═══════════════════════════════════════════════════════════════════
# NOTE: XPIA Defender is tracked in a SEPARATE issue/repo.
# It is NOT part of this master recipe.
# ═══════════════════════════════════════════════════════════════════
# ═══════════════════════════════════════════════════════════════════
# PHASE 2: CLI + LAUNCHER — conditional on Phase 0a result
# ═══════════════════════════════════════════════════════════════════
- id: "phase-2-cli"
type: recipe
recipe: "rust-rewrite-phase"
condition: "'PHASE2_GO' in phase_2_gate"
sub_context:
phase_name: "cli-launcher"
phase_number: "2"
components: |
cli.py create_parser() → amplihack-cli/src/commands/
clap derive with exhaustive Command enum
Every subcommand is an enum variant (no string dispatch)
#[command(subcommand_required = true)] where appropriate
Commands that need Python (memory, eval, proxy) → subprocess
cli.py launch_command() → amplihack-cli/src/launcher.rs
ManagedChild with corrected Drop:
try_wait → SIGTERM → 3s bounded wait → SIGKILL → wait
Nesting detection via AMPLIHACK_SESSION_ID env var
Session tracking with atomic file operations
cli.py main() → bins/amplihack/main.rs
Entry point with signal handling (SIGINT+SIGTERM+SIGHUP)
Child in own process group (setpgid) to prevent double-SIGINT
settings.py → amplihack-state/src/settings_manager.rs
AtomicJsonFile-based settings.json CRUD
update_hook_paths() with HOOK_REGISTRY
ensure_settings_json() with backup and validation
auto_update.py → amplihack-cli/src/auto_update.rs
GitHub Releases API check with 24h cache
Version comparison (semver)
uv tool upgrade subprocess
Argument whitelist for restart safety
output: "phase_2_result"
# ═══════════════════════════════════════════════════════════════════
# NOTE: "Remaining Hooks" has been merged into Phase 1.
# All hooks (Tier 1, Tier 2, Tier 3) are built in Phase 1.
# See the Work Decomposition section for the tier breakdown.
# ═══════════════════════════════════════════════════════════════════
# ═══════════════════════════════════════════════════════════════════
# FINAL VERIFICATION — the project is not done until this passes
# ═══════════════════════════════════════════════════════════════════
- id: "final-verification"
type: agent
agent: "amplihack:reviewer"
prompt: |
Run the COMPLETE project acceptance criteria.
Phase results:
- Phase 0: {{phase_0_result}}
- Phase 1: {{phase_1_result}}
- Phase 2: {{phase_2_result}} (may be empty if skipped)
CHECK EVERY ITEM. Mark pass/fail with evidence:
CORRECTNESS:
[ ] Every function in the Python inventory has a Rust equivalent
[ ] Every golden file passes on both Python and Rust engines
[ ] Every property test passes (10K+ random inputs per function)
[ ] All Python bugs documented in PYTHON_BUGS.md are fixed in Rust
[ ] All TOCTOU races from the original analysis are eliminated
[ ] All stringly-typed dispatch is replaced with exhaustive enums
[ ] All bare `except Exception` blocks have typed error handling
[ ] Forward compatibility: unknown hook events handled gracefully
BEHAVIORAL EQUIVALENCE:
[ ] `amplihack hooks compare --all` shows 0 mismatches between Python and Rust
across 5 consecutive real development sessions
[ ] `amplihack hooks compare --transcript` shows 100% match
on 3 different real session transcripts
[ ] Functional equivalence matrix is 100% green for all phases
INTEGRATION:
[ ] Multicall binary dispatches all hook events correctly
[ ] Exit code contract matches Python (0, 2, other) for all hooks
[ ] SIGPIPE handling works (no broken pipe errors)
[ ] Fail-open: panic in any hook → stdout gets b"{}", exit 0
DISTRIBUTION:
[ ] Platform wheel builds succeed for all 4 targets
[ ] UVX install + staging flow tested end-to-end
[ ] Binary discovery finds Rust binary in expected locations
[ ] `amplihack install` correctly registers Rust hooks when binaries present
DOCUMENTATION:
[ ] CONTRIBUTING_RUST.md exists and is accurate
[ ] PR descriptions include correctness improvement details
[ ] Dogfooding log documents all issues found and fixed
If ANY item fails: output {"status": "NOT_CERTAIN", "failures": [...]}
If ALL items pass: output {"status": "CERTAIN", "evidence": {...}}
Do NOT output CERTAIN unless you have verified every item.
output: "final_verification"
parse_json: trueAppendix: Claude Code Hooks Reference
Official documentation: https://code.claude.com/docs/en/hooks (reference) and https://code.claude.com/docs/en/hooks-guide (guide)
Hook Events (all 18)
| Event | Matcher On | Can Block? | Key Fields |
|---|---|---|---|
SessionStart |
source: startup/resume/clear/compact | No | source, model, agent_type |
InstructionsLoaded |
none (always fires) | No | file_path, memory_type, load_reason |
UserPromptSubmit |
none (always fires) | Yes (exit 2 or decision:block) | prompt |
PreToolUse |
tool name (Bash, Edit, Write, etc.) | Yes (permissionDecision: deny) | tool_name, tool_input, tool_use_id |
PermissionRequest |
tool name | Yes (decision.behavior: deny) | tool_name, tool_input, permission_suggestions |
PostToolUse |
tool name | No (feedback only) | tool_name, tool_input, tool_response, tool_use_id |
PostToolUseFailure |
tool name | No | tool_name, tool_input, error, is_interrupt |
Notification |
type: permission_prompt/idle_prompt/auth_success/elicitation_dialog | No | message, title, notification_type |
SubagentStart |
agent type | No | agent_id, agent_type |
SubagentStop |
agent type | Yes (decision:block) | agent_id, agent_type, agent_transcript_path, last_assistant_message |
Stop |
none (always fires) | Yes (decision:block) | stop_hook_active, last_assistant_message |
TeammateIdle |
none | Yes (exit 2) | teammate_name, team_name |
TaskCompleted |
none | Yes (exit 2) | task_id, task_subject, task_description |
ConfigChange |
source: user/project/local/policy/skills | Yes (except policy_settings) | source, file_path |
WorktreeCreate |
none | Yes (non-zero fails) | name; stdout = absolute path |
WorktreeRemove |
none | No | worktree_path |
PreCompact |
trigger: manual/auto | No | (common fields only) |
SessionEnd |
reason: clear/logout/prompt_input_exit/other | No | (common fields only) |
Common Input Fields
{
"session_id": "abc123",
"transcript_path": "/path/to/transcript.jsonl",
"cwd": "/path/to/project",
"permission_mode": "default|plan|acceptEdits|dontAsk|bypassPermissions",
"hook_event_name": "PreToolUse",
"agent_id": "optional-subagent-id",
"agent_type": "optional-agent-type"
}Exit Code Contract
| Exit Code | Meaning | Behavior |
|---|---|---|
| 0 | Success | Parse stdout for JSON output; proceed |
| 2 | Blocking error | Block the action; stderr shown to Claude as error |
| Other | Non-blocking error | Log stderr; continue execution |
JSON Output Fields
{
"continue": true,
"stopReason": "message",
"suppressOutput": false,
"systemMessage": "warning",
"decision": "block",
"reason": "why blocked",
"hookSpecificOutput": {
"hookEventName": "PreToolUse",
"permissionDecision": "allow|deny|ask",
"permissionDecisionReason": "explanation",
"updatedInput": {},
"additionalContext": ""
}
}Rust Implementation Notes
- The multicall
amplihack-hooksbinary handles all 18 events as subcommands - Read JSON from stdin; no allocation in panic path (pre-baked
b"{}"for fail-open) #[serde(other)] Unknownvariant for forward compatibility with new events- Hooks snapshot at session startup — mid-session changes require
/hooksmenu review async: truefield means run in background without blocking (for logging hooks)- MCP tool names follow pattern
mcp__<server>__<tool> $CLAUDE_CODE_REMOTEis"true"in remote/web environmentsCLAUDE_ENV_FILEin SessionStart allows persisting env vars for the session