Part of #342. Depends on #344.
Scope
Schema (src/coord/store.rs::migrate())
Add four tables. Migration runs free via existing upgrade_db_migrations() step in init --upgrade.
CREATE TABLE tasks (
id TEXT PRIMARY KEY,
name TEXT NOT NULL,
state TEXT NOT NULL, -- PENDING/READY/ASSIGNED/RUNNING/VERIFYING/DONE/RETRYING/RESUMING/NEEDS_HUMAN/CANCELLED
role TEXT, -- target role (mailbox-first)
cwd TEXT NOT NULL,
prompt TEXT NOT NULL,
model TEXT,
budget_usd REAL,
max_retries INTEGER DEFAULT 2,
timeout_min INTEGER DEFAULT 45,
depends_on TEXT, -- JSON array of task ids
policy TEXT, -- JSON: per-task overrides (force_manual, etc.)
created_at INTEGER NOT NULL,
updated_at INTEGER NOT NULL
);
CREATE TABLE task_attempts (
id TEXT PRIMARY KEY,
task_id TEXT NOT NULL REFERENCES tasks(id),
attempt_num INTEGER NOT NULL,
session_id TEXT, -- live session (or null while ASSIGNED)
bus_message_id TEXT, -- mailbox message that assigned this attempt
cwd_hash TEXT, -- tree-state hash at spawn/assign
started_at INTEGER NOT NULL,
ended_at INTEGER,
cost_usd REAL DEFAULT 0,
outcome TEXT -- success/verify_fail/timeout/budget/session_died/cancelled
);
CREATE TABLE task_verifications (
id INTEGER PRIMARY KEY AUTOINCREMENT,
attempt_id TEXT NOT NULL REFERENCES task_attempts(id),
kind TEXT NOT NULL, -- run/brain/agent
command TEXT NOT NULL, -- shell cmd, brain prompt, or agent prompt
verdict TEXT NOT NULL, -- PASS/FAIL
output TEXT, -- truncated
cost_usd REAL DEFAULT 0,
ran_at INTEGER NOT NULL
);
CREATE TABLE task_transitions (
id INTEGER PRIMARY KEY AUTOINCREMENT,
task_id TEXT NOT NULL REFERENCES tasks(id),
from_state TEXT NOT NULL,
to_state TEXT NOT NULL,
cause TEXT NOT NULL, -- assigned/spawned/claim_timeout/verify_pass/verify_fail/timeout/budget/health_<check>/manual/...
at INTEGER NOT NULL
);
Reconciler (src/coord/supervisor.rs)
pub struct Supervisor { tick_ms: u64, policy: Policy }
impl Supervisor {
pub fn tick(&mut self, coord: &mut CoordStore, sensors: &dyn Sensors) -> Vec<Action>;
}
pub enum Action { AssignMailbox(TaskId, Role), Spawn(TaskId, Cwd), Retry(TaskId), MarkDone(TaskId), EscalateHuman(TaskId, String), ... }
Reconciler is pure — reads desired state from coord, observed from sensors, returns actions. A separate Actuator performs them. Makes testing trivial.
Wiring
One call site: run_headless() at src/commands.rs:962, behind #[cfg(feature = "coord")]. No new process model.
Crash-safety test (load-bearing)
This is THE claim of the whole RFC — write the test before adding features on top.
- Start headless daemon, submit 3 tasks, wait until one is RUNNING.
kill -9 the daemon.
- Restart. Within 3 ticks, observed state matches pre-kill state. Same task counts in each bucket. No duplicate spawns. No lost attempts.
Out of scope (deferred to later PRs)
- Verifier execution (PR5).
- Bus-mailbox assignment (PR4).
- Resume context via autopsy (PR6).
- TUI surface for tasks.
Part of #342. Depends on #344.
Scope
Schema (
src/coord/store.rs::migrate())Add four tables. Migration runs free via existing
upgrade_db_migrations()step ininit --upgrade.Reconciler (
src/coord/supervisor.rs)Reconciler is pure — reads desired state from coord, observed from sensors, returns actions. A separate
Actuatorperforms them. Makes testing trivial.Wiring
One call site:
run_headless()atsrc/commands.rs:962, behind#[cfg(feature = "coord")]. No new process model.Crash-safety test (load-bearing)
This is THE claim of the whole RFC — write the test before adding features on top.
kill -9the daemon.Out of scope (deferred to later PRs)