feat(coord): bus-native task assignment + supervisor actuator + MCP tools (#345)#353
Merged
Merged
Conversation
…ools (#345) PR4 of the supervisor RFC (#342). First turn where the supervisor *acts*: pending tasks move to ASSIGNED via mailbox writes, and roleless tasks go straight to spawn. Crash-safety claim from #342 ("restart re-converges from coord.db") moves from aspirational to tested. ## Reconciler now decides `Supervisor::tick()` was a no-op skeleton in PR3. It now emits: - `AssignViaMailbox` for PENDING tasks with a `role`. - `Spawn` for PENDING tasks without a role. - `Spawn` (claim-fallback) for ASSIGNED tasks past `claim_timeout_min` with no observable session at their cwd. Hard ceiling on in-flight (Assigned + Running) so a backlog can't push the fleet past `policy.max_concurrent`. Unknown observed sessions stay load-bearing: the spawn-fallback path gates on Running/Idle observations only; Unknown is treated as "do nothing this tick." RFC v2 §6's no-actuation backstop survives PR4. ## Actuator (`src/coord/actuator.rs`) Pairs with the reconciler. `apply(&mut conn, &SideEffects, &Action)` carries each action out in three deterministic steps: insert attempt row → perform side effect → log transition. A side-effect failure stops *before* the transition is recorded so the next tick re-emits the action. The reconciler/actuator split keeps reconciler tests pure (no bus, no terminal) and keeps decisioning out of the actuator. `SideEffects` trait abstracts the bus/launcher. Production wires `LiveSideEffects` (real bus publish; spawn stubbed for PR5); `NoopSideEffects` covers the `--no-default-features --features hive` build path; tests use `RecordingFx` to assert what fired without standing up either. Idempotency under restart is the load-bearing property: - AssignViaMailbox against an already-Assigned task is a no-op. - The crash-safety test boots a "daemon," assigns three tasks, then re-runs the actuator with the same actions from a fresh `RecordingFx` (the "restart"). Zero re-publishes — the ledger is the source of truth, not in-memory state. ## MCP tools (`src/bus/mcp.rs`) Three new tools on the bus MCP server: - `submit_task` — file work to the supervisor. Equivalent to the (still forthcoming) `task.created` bus message intake; the issue's "supervisor is just a privileged bus participant" claim made operational from the MCP side. Carries every NewTask field including per-task `policy` JSON. - `list_tasks` — read fleet view, optional state filter. - `task_status(id)` — single task with the full transition log so an operator can pull state without `sqlite3` archaeology. Bus's existing rmcp tool router picks them up; no new server surface. ## Headless wiring `run_headless()` now runs the actuator after the reconciler each tick. Emits a `supervisor_tick` event with `{emitted, actuated, errors}` counts so dashboards can chart fleet throughput. Per-action failures log `supervisor_action_failed` with the error message — visible without grepping logs. ## Task CRUD additions - `insert_attempt` — actuator allocates the attempt row before the side effect so a publish-that-doesn't-return still leaves a recoverable ledger entry. - `attempt_count` — for picking the next `attempt_num` race-free. - `latest_attempt_age_minutes` — drives the claim_timeout check; uses a tiny hand-rolled ISO-8601 parser instead of pulling chrono. - `task_status::transitions` plumbing in the MCP tool. ## Out of scope - Real `spawn_session` against the terminal launcher — PR5 pairs it with the verifier runner so the spawn path has the `attempt → spawn → verify` chain in one place. - `task.created` bus-intake — the equivalent submit-via-publish is in PR5's submission path (current MCP tool covers the in-tool case). - `retry_task` MCP tool — admin surface, lands with PR7's exporter. - TUI task list view — also PR7. ## Verification cargo check, clippy (both feature sets), fmt, test all green. 744 binary lib tests pass (up from 738); 733 in lib; 78 integration; 8 unit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ae876a4 to
bfd8428
Compare
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR4 of the supervisor RFC (#342). Closes #345.
Stacked on #352 (will retarget to main once #352 merges).
First turn where the supervisor actually acts. Pending tasks move to ASSIGNED via mailbox writes, roleless tasks go straight to spawn. The crash-safety claim from the epic ("restart re-converges from coord.db") moves from aspirational to tested.
Reconciler decides
`Supervisor::tick()` was a no-op skeleton in #352. It now emits:
Hard ceiling on in-flight (`Assigned + Running`) so a backlog can't push the fleet past `policy.max_concurrent`.
Unknown observed sessions stay load-bearing: the spawn-fallback path gates on Running/Idle observations only; Unknown is treated as "do nothing this tick." RFC v2 §6's no-actuation backstop survives.
Actuator (`src/coord/actuator.rs`)
Pairs with the reconciler. `apply(&mut conn, &SideEffects, &Action)` carries each action out in three deterministic steps: insert attempt row → perform side effect → log transition. A side-effect failure stops before the transition is recorded so the next tick re-emits the action. The reconciler/actuator split keeps reconciler tests pure (no bus, no terminal) and keeps decisioning out of the actuator.
`SideEffects` trait abstracts the bus/launcher:
Crash-safety test (load-bearing)
The epic's headline claim. The test:
Companion test: `assign_is_idempotent_against_already_assigned_task` covers the per-action race.
MCP tools
Three new tools on the bus MCP server (`src/bus/mcp.rs`):
Bus's existing rmcp tool router picks them up; no new server surface.
Headless wiring
`run_headless()` runs the actuator after the reconciler each tick. Emits a `supervisor_tick` event with `{emitted, actuated, errors}` counts so dashboards can chart fleet throughput. Per-action failures log `supervisor_action_failed` — visible without grepping logs.
Out of scope
Verification
```
cargo check --all-targets ✓
cargo check --all-targets --no-default-features --features hive ✓
cargo clippy both feature sets -- -D warnings ✓
cargo fmt --all -- --check ✓
cargo test --all-targets → 733 + 744 + 78 + 8 = 1563 pass
```
Test plan
🤖 Generated with Claude Code