From 39b226aece311d2e52e6ae9c17f30a4ad4051447 Mon Sep 17 00:00:00 2001
From: ncrispino <ncrispino@wustl.edu>
Date: Thu, 4 Jun 2026 14:56:08 -0700
Subject: [PATCH 1/4] perf+fix: "Real Parallelism" eng-health tranche (races,
 event-loop offload, injection de-dup)

Implements the actionable scope of docs/dev_notes/next_version_eng_health_plan.md
(from two adversarially-verified audit workflows). All under TDD with cost-free
simulation (mock backends + real collaborator code, no LLM calls). Zero regressions:
the orchestrator characterization safety net + injection/restart/hooks suites stay green.

Correctness (concurrency races, all lock-free):
- R1: capture peer revision counts at injection-selection time and thread them
  through register/mark_seen (seen_counts=) so a peer revision published during the
  snapshot-copy await is not silently marked "seen" and stays injectable.
- R2/R3: consume only delivered subagent ids instead of a blind whole-key pop, so a
  background result appended during the await window survives.
- R4: cancel+await detached background trace tasks in ActiveCoordinationCleanup
  before flush, so they don't outlive the hard timeout.
- R5: gather cancelled background tasks before clearing the registry in
  cancel_all_subagents.

Latency:
- B1: offload the blocking snapshot copy (rmtree/copytree/scrub) to asyncio.to_thread
  so it no longer stalls the event loop for other agents. Landed after R1/R2 per the
  critical sequencing (the now-yielding copy would otherwise expose those races).
- C2: loguru brace-style deferred formatting for the per-injection debug logs
  (no eager multi-KB f-string when no DEBUG sink). NOTE: logger is loguru, not stdlib.

Refactor:
- A1: unify the two ~150-line near-duplicate get_injection_content closures into
  MidStreamInjectionHookInstaller.build_midstream_injection(..., native=); both setup
  paths delegate. Closes a backend-parity hazard. orchestrator.py 8,561 -> 8,422.

Reliability:
- D2: record + surface per-round worktree isolation degradation on AgentState
  instead of swallowing it into one log line.
- D3 (scoped): guard the changedoc enrichment so a filesystem error can't kill a
  valid-answer agent.

Hygiene:
- E1/E2: rewrite assertion-free test files (test_message_context_building,
  test_grok_backend) with real assertions verified against actual output.
- A3: correct stale orchestrator_refactor_roadmap status.
- B5: correct the now-false "workspace clears remove .massgen/" comment.

New tests: test_concurrency_race_fixes, test_snapshot_copy_offload,
test_answer_normalizer_debug_guard, test_midstream_injection_unified (60+ cases).

Deferred (documented in the plan): B2 incremental snapshot copy (needs B1 measured),
C3 save_agent_snapshot offload (would add a race on shared counters), B4.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 .../dev_notes/next_version_eng_health_plan.md | 190 ++++++++
 .../orchestrator_refactor_roadmap.md          |   5 +-
 .../filesystem_manager/_filesystem_manager.py |  24 +-
 massgen/orchestrator.py                       | 452 ++++++------------
 .../active_coordination_cleanup.py            |  15 +
 .../answer_text_normalizer.py                 |  14 +-
 .../midstream_injection_hook_installer.py     | 193 +++++++-
 .../peer_answer_visibility_tracker.py         |  37 +-
 .../subagent_lifecycle_coordinator.py         |  24 +
 massgen/subagent/manager.py                   |   9 +
 .../test_answer_normalizer_debug_guard.py     |  89 ++++
 massgen/tests/test_concurrency_race_fixes.py  | 406 ++++++++++++++++
 massgen/tests/test_grok_backend.py            | 196 +++-----
 .../tests/test_message_context_building.py    | 277 +++--------
 .../tests/test_midstream_injection_unified.py | 165 +++++++
 massgen/tests/test_snapshot_copy_offload.py   | 162 +++++++
 16 files changed, 1607 insertions(+), 651 deletions(-)
 create mode 100644 docs/dev_notes/next_version_eng_health_plan.md
 create mode 100644 massgen/tests/test_answer_normalizer_debug_guard.py
 create mode 100644 massgen/tests/test_concurrency_race_fixes.py
 create mode 100644 massgen/tests/test_midstream_injection_unified.py
 create mode 100644 massgen/tests/test_snapshot_copy_offload.py

diff --git a/docs/dev_notes/next_version_eng_health_plan.md b/docs/dev_notes/next_version_eng_health_plan.md
new file mode 100644
index 000000000..8ef1609d8
--- /dev/null
+++ b/docs/dev_notes/next_version_eng_health_plan.md
@@ -0,0 +1,190 @@
+# Next Version — Engineering Health Plan
+
+**Theme: "Real Parallelism"** — make the orchestrator's N-agents-in-parallel promise actually hold, fix the correctness races that the current (accidental) serialization masks, and finish the one remaining refactor blocker. No per-backend functionality changes (parity principle).
+
+**Date:** 2026-06-04
+
+## Implementation status (2026-06-04)
+
+Implemented under TDD (tests written first, confirmed red, then green) with cost-free
+simulation (mock backends / real collaborator code, no LLM calls). Zero regressions —
+the 37-test orchestrator characterization safety net plus the injection/decomposition/
+novelty suites stay green; all pre-existing failures verified identical with changes
+stashed.
+
+| Item | Status | Tests |
+|---|---|---|
+| R1 lost peer-answer revision | ✅ done | `test_concurrency_race_fixes.py` (R1 ×4) |
+| R2/R3 lost subagent result (blind pop) | ✅ done | same (R2 ×5) |
+| R4 leaked trace tasks on cleanup | ✅ done | same (R4 ×2) |
+| R5 cancel-without-await teardown | ✅ done | same (R5 ×1) |
+| D2 surface worktree-isolation degradation | ✅ done | same (D2 ×3) |
+| D3 changedoc enrichment non-fatal | ✅ done (scoped to changedoc) | same (D3 ×2) |
+| B1 snapshot copy off the event loop | ✅ done | `test_snapshot_copy_offload.py` (×5) |
+| C2 eager debug f-strings | ✅ done (loguru brace-style; NOT `isEnabledFor` — logger is loguru) | `test_answer_normalizer_debug_guard.py` (×3) |
+| E1 assertion-free context test | ✅ done (assertions verified vs real output) | `test_message_context_building.py` (×4) |
+| E2 assertion-free grok test | ✅ done (offline unit + live_api skips) | `test_grok_backend.py` |
+| A3 stale roadmap | ✅ done | `orchestrator_refactor_roadmap.md` |
+| **A1 unify injection closures** | ✅ **done** — the two ~150-line `get_injection_content` closures collapsed to `MidStreamInjectionHookInstaller.build_midstream_injection(..., native=)`; both setup methods now delegate. orchestrator.py 8,561 → 8,422. Canonical side-effect order preserves the `update_context → refresh_checklist` invariant for both paths. | `test_midstream_injection_unified.py` (×9, incl. cross-path effect-order equality) + 201-test injection/restart/hooks regression sweep |
+| **B2 incremental snapshot copy** | ⏳ **deferred** by design — large + real correctness risk (per-viewing anonymization + dest rewriting); only justify after B1's stall-vs-volume split is measured in a real run. |
+| **C3 save_agent_snapshot offload** | ⛔ **assessed, not done (intentional)** — unlike B1 (disjoint temp dirs), this method mutates shared `AgentState` counters (answer_count / checklist_calls / pending_checklist_recheck_labels — load-bearing per its docstring). Offloading to a worker thread would risk a NEW race on that shared state — the opposite of the goal. Audit rated it Low (discrete events, not hot path). Leave on the loop. |
+| **B5 per-round subagent rewrite guard** | ◐ **partial** — corrected the stale "workspace clears remove .massgen/" comment (false: clear_workspace is disabled + preserves .massgen). Did NOT add the skip-guard: `rewrite_subagent_mcp_config_files(agent_id)` may legitimately change per round; guarding without verifying idempotency could silently break subagent MCP wiring for a few-ms win. |
+| **D3 (emitter/status/web-display guards)** | ⏳ remaining — deliberately scoped small; the changedoc path (the only unguarded throw-prone bookkeeping op) is done. Broaden only if a real failure is seen. |
+| **B4 path-rewrite skip** | ⏳ fold into B2 when that lands. |
+
+**Critical sequencing honored:** R1 + R2/R3 (the yield-exposed races) landed BEFORE B1,
+so making the copy truly async did not expose a latent race. A B1 test asserts peer/
+subagent delivery survives a genuinely-yielding copy window.
+
+**New helpers added:** `PeerAnswerVisibilityTracker.mark_seen_answer_revisions(..., seen_counts=)`
++ `register_injected_answer_updates(..., seen_counts=)`; `Orchestrator._capture_answer_revision_counts`,
+`_consume_pending_subagent_results`, `_record_round_isolation_degraded`,
+`_attach_changedoc_to_latest_answer`; `SubagentLifecycleCoordinator.consume_pending_subagent_results`;
+`FilesystemManager._copy_snapshots_to_temp_workspace_sync` (offloaded via `asyncio.to_thread`).
+`AgentState.round_isolation_degraded` / `round_isolation_error` fields.
+
+---
+
+**Source:** two adversarially-verified audit workflows (`massgen-eng-health-audit`, `massgen-parallelism-correctness-sweep`). Every finding below was checked through ≥2 independent verifier lenses; impact ratings are the verifier-revised values, not the original finder claims.
+
+> **Verify before acting.** These are point-in-time observations with file:line anchors that may drift. Re-read the cited code before implementing — the audits already caught several stale-roadmap and overstated-impact claims, so treat magnitudes skeptically and trust the *mechanism* over the prose.
+
+---
+
+## 0. Where the refactor actually stands (correcting the old roadmap)
+
+The collaborator-extraction refactor is **substantially done and healthy** — more refactoring is **not** the right theme for the next version.
+
+- `orchestrator.py`: **21,599 → 8,561 lines**, 50 collaborator modules in `massgen/orchestrator_collaborators/`.
+- `AgentOrchestrationSetup` is **already fully extracted** (`agent_orchestration_setup.py:20-141`); only a thin delegator loop remains in `__init__` (`orchestrator.py:884-908`). The roadmap *table* (`orchestrator_refactor_roadmap.md:45,47`) is stale; the prose (`:56-90`) is correct.
+- **Only one genuine refactor blocker remains** (A1 below): the duplicated mid-stream injection closures across 3 backend paths, which block completing `MidStreamInjectionHookInstaller`.
+- **De-scoped as LOC-only (no correctness/hot-path payoff):**
+  - `textual_terminal_display.py` (14k) — `TextualApp` already uses constructor injection, not closure-over-self; a split is pure LOC redistribution with real UI-regression risk. Defer.
+  - `config_builder.py` (5.5k) — interactive wizard, near-zero mutable state, already tested + faceted via `run()`. Defer.
+  - `system_prompt_sections.py` / `base_with_custom_tool_and_mcp.py` — cohesive, never on the roadmap. Only `ChecklistGate` (1,558 LOC inside `system_prompt_sections.py:237-1794`) is a real extraction candidate, and only if that file is touched anyway.
+
+---
+
+## 1. The critical interaction (read this first)
+
+**Several concurrency races are currently masked because `copy_snapshots_to_temp_workspace` (`_filesystem_manager.py:2339`) is an `async def` doing purely blocking sync I/O — it never yields.** That accidental non-yielding critical section serializes work that would otherwise race.
+
+The headline latency fix (B1) is to wrap that copy in `asyncio.to_thread` so it stops stalling every other agent's streaming. **But doing B1 introduces a real yield point**, which can expose the latent races that the blocking copy currently hides.
+
+**Therefore: the race fixes (R1, R2/R3) are PREREQUISITES for B1, not independent of it.** Shipping B1 alone would trade a latency win for a correctness regression. Sequencing below respects this.
+
+---
+
+## 2. Correctness — concurrency races (Tranche 1, do first, all lock-free)
+
+MassGen's concurrency is **mostly safe-by-construction**: the FIRST_COMPLETED consumer loop (`orchestrator.py:2511`) processes agent results one-at-a-time. Real races only intrude where (a) per-agent injection hooks hit genuinely-yielding awaits, or (b) detached background tasks append concurrently. **None crash or corrupt state — they silently drop peer/subagent feedback, degrading refinement quality.**
+
+### R1 — Lost peer-answer revision (seen-count re-read live after await) — **Medium / Medium**
+- **Where:** `orchestrator.py:3589` (select content) → `:3618` (await snapshot copy = real yield) → `:3662` (mark seen); live re-read at `orchestrator_collaborators/peer_answer_visibility_tracker.py:93`.
+- **Shared state:** `agent_states[A].seen_answer_counts[B]` vs `coordination_tracker.answers_by_agent[B]` length.
+- **Interleaving:** A's injection hook captures B's rev-2 content at 3589, suspends at the 3618 await; the consumer loop appends B's rev-3 (`:2646`); A resumes and sets `seen_answer_counts[B] = len(...) = 3` though it was only shown rev-2.
+- **Consequence:** `_has_unseen_answer_updates(A)` returns False for B → rev-3 never injected, never triggers restart → A refines against stale peer work.
+- **Fix:** Capture the source revision counts in the *same* read as content selection (before the 3618 await); thread the captured count through `register_injected_answer_updates` / `mark_seen_answer_revisions`, or set `seen_answer_counts[src] = min(captured, current)`. **Same pattern recurs at `orchestrator.py:4057/4078`, `4354/4392/4410`, `4776/4802/4821` — fix all sites.** Pure data-flow, no lock.
+
+### R2 — Lost background result, blind whole-key pop (hook path) — **Medium / Small**
+- **Where:** `orchestrator.py:4019` (read-snapshot) → `:4064` (await copy) → `:4116` (`pop(agent_id, None)`); writer `subagent_lifecycle_coordinator.py:54-56`, fired from detached task `trace_analyzer_runner.py:623`.
+- **Shared state:** `orchestrator._pending_subagent_results[agent_id]`.
+- **Interleaving:** Consumer snapshots the list, awaits; the detached trace-analyzer task appends during the window; consumer resumes and `pop()`s the *whole* key, deleting the fresh append.
+- **Consequence:** Result silently dropped. **Trace-analyzer guidance is permanently lost** (only flows through this in-process queue; no MCP-poll fallback). Real MCP subagents self-heal on next poll.
+- **Fix:** Remove only the consumed snapshot's ids, not the whole key:
+  `self._pending_subagent_results[agent_id] = [e for e in ...get(agent_id, []) if e[0] not in consumed_ids]` (drop key if empty). **Do NOT add an asyncio.Lock in this hot path.**
+- **Correction from audit:** the real production writer is `trace_analyzer_runner.py`; `register_completion_callback` is test-only wiring.
+
+### R3 — Same blind-pop, hookless path — **Low / Small**
+- **Where:** `orchestrator.py:4216` → `:4222` (await `subagent_hook.execute`) → `:4229` (`pop`). Same shape as R2 on the Codex/hookless fallback. Self-heals via MCP re-poll → delayed delivery, not silent loss.
+- **Fix:** Same consumed-id filtering; apply at `:4116` and `:4229` together.
+
+### R4 — Detached trace tasks not cancelled by timeout cleanup — **Low / Small**
+- **Where:** `active_coordination_cleanup.py:39` (flush) + `:41-62` (cancels only `_active_tasks`, never `_background_trace_tasks`).
+- **Consequence:** On orchestrator timeout, a trace task survives past the hard timeout (wastes compute), then appends post-flush into a dict no one reads. No corruption; violates the cleanup docstring contract.
+- **Fix:** In `ActiveCoordinationCleanup.cleanup()`, cancel-and-await `_background_trace_tasks` (mirror the `_active_tasks` loop at `:49-62`), then clear. The task's `CancelledError` path (`trace_analyzer_runner.py:476-489`) returns without writing, so awaiting fully closes the window.
+
+### R5 — `cancel_all_subagents` cancels without awaiting before clearing registry — **Low / Small (teardown hygiene)**
+- **Where:** `subagent/manager.py:4190-4232`. Shutdown-only. Both damaging outcomes already guarded (cancelled-status preservation `:2887-2902`; idempotent late pop). Only residual: a late completion callback post-shutdown.
+- **Fix:** `await asyncio.gather(*cancelled_tasks, return_exceptions=True)` before `clear()`/marking. Hygiene, not a user-facing bug.
+
+### R6 — Background status-file writer torn read — **Low (benign) / Optional (WONTFIX)**
+- **Where:** writer `coordination_tracker.py:660/752`; reader `save_status_file` offloaded via `run_in_executor` (`step_mode_handler.py:187`). Genuine cross-thread read, but **no exception possible** (keys fixed at init `:232`; CPython append-during-iteration doesn't raise). `status.json` is observational only.
+- **Fix (optional):** build an immutable copy of needed fields on the event-loop thread before handing to the executor. Do **not** lock the hot append path.
+
+### Safe-by-construction — do NOT "fix" these
+- **`pending_checklist_recheck_labels` "dual-writer"** (`peer_answer_visibility_tracker.py:251` vs `checklist_gate_manager.py:872`): both writers run inside the *same single agent's* task → cannot interleave. The roadmap's dual-writer flag is a **false alarm**; keep only as a tripwire if injection ever moves to a task distinct from the agent's tool-execution task.
+- **`restart_pending` overwrite:** currently safe *only because* the snapshot copy doesn't yield. **Re-evaluate after B1.** (R1 is independent of this — R1 is the live count re-read, not the bool.)
+- **Midstream injection fairness cap:** one task per agent (`orchestrator.py:2503`, `:6817`); hooks run sequentially (`hooks.py:584`).
+- **`rate_limiter.py`:** check-then-act with no await between check and assignment; real sections under `async with self._lock`. Correct.
+- **`BroadcastChannel`:** all mutations + compound ops under one `async with self._lock` (`:321`); shadow path is sequential (`:233-250`). Correct.
+
+---
+
+## 3. Latency — event-loop offloading (Tranche 2, after race fixes)
+
+### B1 — `to_thread`-wrap the snapshot copy — **Medium impact / Small effort** (the single best ROI)
+- **Where:** `_filesystem_manager.py:2339-2397` (`copy_snapshots_to_temp_workspace`: `_safe_rmtree:2360`, `copytree:2379`, `scrub:2395`; TODO at `:2353`); wrapper `snapshot_manager.py:144-195`; call sites `orchestrator.py:3618,4064,4392,4802,6069`; parallel loop `:2511`.
+- **Problem:** `async def` doing synchronous `rmtree`/`copytree`/scrub/path-walk with no `to_thread`. While one agent copies peers' snapshots, the single event-loop thread is blocked → no other agent's chunk is consumed → the intended FIRST_COMPLETED parallelism is serialized.
+- **Fix:** Wrap the blocking body in `await asyncio.to_thread(...)`. Each agent owns a distinct temp workspace → concurrent offloaded copies write disjoint dirs, no race.
+- **Gate:** Only after R1/R2/R3 land (this introduces the real yield that exposes them). Add a test asserting peer-revision delivery survives a yielding copy.
+
+### Opportunistic offloads (fold in when adjacent code is touched)
+- **C3** `save_agent_snapshot` sync writes + copytree (`snapshot_manager.py:222-279`) — Low; offload to `to_thread`, preserve byte-for-byte side-effect ordering (`:22-26`).
+- **C2 (debug-guard half)** eager `logger.debug` f-strings interpolating full answer bodies even with DEBUG off (`answer_text_normalizer.py:108-110,115-117`) — guard with `if logger.isEnabledFor(logging.DEBUG):`. Skip the micro-optimization of the replace loop (N,M ~2-5).
+- **B5/C4** per-round subagent dir + MCP-config rewrite (`orchestrator.py:6084-6087`) — Low; the "defensive against workspace clears" rationale is **dead** (`.massgen/` is now preserved, `_filesystem_manager.py:2059-2063`; in-loop `clear_workspace()` at `:6078` is commented out). Guard on a cheap existence check.
+
+---
+
+## 4. Refactor blocker + reliability (Tranche 3)
+
+### A1 — Unify duplicated mid-stream injection closures — **Medium / Medium**
+- **Where:** `orchestrator.py:3553-3700` (GeneralHookManager closure) vs `:4745-4858` (native closure); duplicated tails `:3705-3806` vs `:4863-4929`; remaining install methods `:3491,3808,3863,4712`. Target: `orchestrator_collaborators/midstream_injection_hook_installer.py` (already owns 6 pure helpers).
+- **Problem:** Two near-identical `async def get_injection_content()` closures; a debug workspace-listing block exists in only one path (proof of drift). A fix to one backend path silently skips the other (CLAUDE.md backend-parity hazard).
+- **Note:** the scary "ordering divergence" claim was **verified inert** — `restart_pending` recompute reads only `seen_answer_counts`, mutated at the same relative position in both paths. This is near-mechanical de-dup.
+- **Fix:** Extract a single `async build_midstream_injection(agent_id, answers, *, native=False)` onto the installer; both setup methods call it via the hook callback. Move the ~95%-duplicated tails into a shared `_register_common_post_tool_hooks(...)`. Add a regression test asserting `available_agent_labels` reflect injected labels across both paths. Completes the `MidStreamInjectionHookInstaller` extraction.
+
+### D3 — Make post-record bookkeeping non-fatal — **Small (targeted)**
+- **Where:** `orchestrator.py:2544` (one try spanning the whole per-chunk handler) → `:3110-3136` (except, `is_killed=True` at `:3123`). `add_agent_answer` at `:2646` runs *before* the fallible enrichment.
+- **Problem:** A bug in non-essential bookkeeping (changedoc `:2662-2681`, emitter `:2698-2705`, status writes) kills the whole agent identically to a real stream error.
+- **Note:** "answer may be lost" is mostly refuted — `add_agent_answer` runs first and `is_killed` doesn't purge `answers_by_agent`. Genuine loss only in the narrow pre-record window (`_coerce` `:2627`, `_save_agent_snapshot` `:2633`).
+- **Fix:** Wrap **only** the post-record enrichment in a local `try/except-log-continue`; leave stream consumption (`:2546`) and pre-record snapshot (`:2633`) under the fatal except. **Do NOT split the whole 570-line try** (high risk in a hot loop). Tests: inject changedoc/emitter failures, assert answer survives and `is_killed` stays False.
+
+### D2 — Surface worktree-isolation degradation — **Low–Medium / Medium**
+- **Where:** `orchestrator.py:6216-6218` (broad `except Exception` → `logger.warning` → `round_worktree_paths=None`).
+- **Note:** "cross-agent clobbering" is **substantially refuted** — each agent already owns a separate `cwd`; the per-round feature only layers git branches on top, and shared dicts are written only after success (`:6211-6212`). Real kernel: weak observability from a bare except. Strongest residual risk is the writable shared-context-path case.
+- **Fix:** Narrow the except to git/worktree errors; emit a user-visible `StreamChunk` warning (not just `logger.warning`) and record degraded state on `AgentState`. Don't abort the round on transient git errors (would regress graceful degradation). Test: inject isolation-setup failure, assert visible signal + recorded state.
+
+---
+
+## 5. Hygiene quick wins (any time, all Small)
+
+- **A3** — fix the stale roadmap table (`orchestrator_refactor_roadmap.md:45,47`): mark AgentOrchestrationSetup done, re-scope the installer cluster to (1) closure unification, (2) hook-method relocation.
+- **E1** — `test_message_context_building.py:52,82,125,172`: 4 collected tests with **zero assertions** (only `print()`s, return None → pass unconditionally). Convert booleans to `assert`s, strip the `sys.path`/`main()` scaffolding (`:11-13,243-274`), verify red-green, or delete. Narrow real gap: history-section rendering / turn-progression (core path otherwise covered in `test_position_bias_calibration.py`).
+- **E2** — `test_grok_backend.py:20,54,92`: print-driven, `return True/False`, no asserts. **Correction:** these do NOT error in CI (return-not-None only fires for *sync* tests; under `asyncio_mode=auto` they pass silently). Grok coverage already exists elsewhere → **rewrite as real pytest or just delete**.
+- **Sweep:** `rg -L "def test_" massgen/tests | xargs rg -L "assert"` to find other assertion-free collected test files.
+
+---
+
+## 6. Recommended sequencing
+
+1. **Tranche 1 — Correctness (lock-free):** R1, R2/R3. Prerequisites for B1. Directly improves convergence quality.
+2. **Tranche 2 — Latency:** B1 `to_thread` offload (now safe). Test peer-revision delivery survives a yielding copy. Fold in C2 debug-guard.
+3. **Tranche 3 — Refactor + reliability:** A1 (finish installer extraction), R4, D3 (targeted), D2. Plus A3/E1/E2 hygiene.
+4. **Tranche 4 — Bigger latency (only after B1 measured):** B2 incremental snapshot copy (hash/mtime-skip unchanged peers; copy only just-answered agents on injection), folding in B4 path-rewrite skip and C1 single-walk. Large effort + real correctness risk (per-viewing anonymization + dest rewriting), bounded by the 2-injection/round cap — justify only after B1 reveals stall-vs-volume split.
+
+**Ordering rationale:** correctness before the latency change that exposes it; B1 establishes the measurement baseline before the expensive B2; the one refactor blocker (A1) is mechanical and de-risked; hygiene is free.
+
+---
+
+## What's Next
+
+- Implement Tranche 1 (R1 + R2/R3) under TDD — write the race-exposing tests first (force a yield between content-select and seen-mark; force a background append during the pop window).
+- Then B1 with its delivery-survival test, confirming the race fixes hold under a genuinely-yielding copy.
+- Defer Linear issues / branch / push until explicitly approved (per current instruction).
+- Re-run the parallelism sweep's failed `check-then-act` verifier leg if deeper assurance on that angle is wanted before implementation.
+
+**Raw audit outputs (this session, ephemeral):**
+- Eng-health: `/private/tmp/.../tasks/w60ugzohg.output`
+- Concurrency: `/private/tmp/.../tasks/wuisd41xj.output`
+- Workflow scripts saved under `.../workflows/scripts/` (re-runnable via `Workflow({scriptPath, resumeFromRunId})`).
diff --git a/docs/dev_notes/orchestrator_refactor_roadmap.md b/docs/dev_notes/orchestrator_refactor_roadmap.md
index c5fec7b4b..3ccefa3a4 100644
--- a/docs/dev_notes/orchestrator_refactor_roadmap.md
+++ b/docs/dev_notes/orchestrator_refactor_roadmap.md
@@ -64,7 +64,7 @@
 - [x] Step 23 (PeerAnswerVisibilityTracker-13 methods) — **DONE & verified green** (first batch with zero verifier issues). orchestrator.py → 14,866. Dual-writer field `pending_checklist_recheck_labels` mutated via orch back-ref so ChecklistGateManager later sees the same live set.
 - [x] Step 24 (ChecklistGateManager-11 methods, 1252 lines) — **DONE & verified green** (largest single extraction; 219 tests passed across checklist/criteria/round_evaluator suites). 1 regression fixed: collaborator's `resolve_effective_checklist_criteria` called `self.get_active_criteria(...)` directly → bypassed test monkeypatches on `orchestrator._get_active_criteria` → repointed to `orch._get_active_criteria(...)`.
 - [x] Step 29 (MidStreamInjectionHookInstaller — partial: 6 of 18 pure helpers extracted, 310 lines). The 12 hook-installation methods with duplicated callback closures across 3 backend paths (`_setup_hook_manager_for_agent`, `_setup_codex_mcp_hooks`, `_setup_codex_hybrid_hooks`, `_setup_native_hooks_for_agent`, `_register_round_timeout_hooks`, etc.) remain on Orchestrator — they need a callback-unification pass first (behavior-changing, out of scope for pure extraction).
-- [ ] Step 27 (AgentOrchestrationSetup-2 methods) skipped: lives inline in `__init__` as a nested function + loop; "extraction" requires `__init__`-rewiring not just method-relocation — different kind of refactor.
+- [x] Step 27 (AgentOrchestrationSetup) — **DONE** (corrected 2026-06-04 by eng-health audit). The collaborator now lives at `massgen/orchestrator_collaborators/agent_orchestration_setup.py`; only a thin per-agent delegator loop remains inline in `__init__`. (The earlier "skipped" status was stale — the `__init__`-rewiring was completed.)
 
 ## Release status — SHIPPED (50% milestone)
 
@@ -85,7 +85,8 @@ After completing the original plan, identified and extracted 4 more cohesive clu
 
 ## Follow-up release work
 
-- **AgentOrchestrationSetup**: hoist the inline per-agent setup function out of `__init__` first (a small structural refactor), THEN extract as a collaborator.
+- **AgentOrchestrationSetup**: DONE — extracted to its own collaborator; only a thin `__init__` delegator loop remains (optional future hoist).
+- **MidStreamInjectionHookInstaller**: collaborator exists (`midstream_injection_hook_installer.py`, holds the pure helpers). The remaining work is the duplicated `get_injection_content` closures across the 3 backend paths — a callback-unification pass (audit item A1), tracked in `next_version_eng_health_plan.md`.
 - **MidStreamInjectionHookInstaller remaining 12 methods**: unify the duplicated `get_injection_content` closures across the 3 backend paths (behavior-changing, needs its own validation), THEN extract.
 - These are not blocking — Orchestrator is now 38% smaller and the remaining bulk is the 4 hook-install paths + the streaming-loop core.
 
diff --git a/massgen/filesystem_manager/_filesystem_manager.py b/massgen/filesystem_manager/_filesystem_manager.py
index fc5052a60..c0637286a 100644
--- a/massgen/filesystem_manager/_filesystem_manager.py
+++ b/massgen/filesystem_manager/_filesystem_manager.py
@@ -14,6 +14,7 @@
 MCP tools configured.
 """
 
+import asyncio
 import json
 import os
 import shutil
@@ -2350,7 +2351,28 @@ async def copy_snapshots_to_temp_workspace(self, all_snapshots: dict[str, Path],
         Returns:
             Path to the temporary workspace with restored snapshots
 
-        TODO: reimplement without 'shutil' and 'os' operations for true async
+        B1: the blocking filesystem work (rmtree/copytree/scrub) is offloaded to a
+        worker thread via ``asyncio.to_thread`` so it does not stall the
+        orchestrator event loop. While one agent's snapshots are copied, the other
+        agents' streams keep being consumed. Each agent owns a distinct
+        ``agent_temporary_workspace`` directory, so concurrent offloaded copies
+        write to disjoint paths — there is no shared-state race.
+        """
+        if not self.agent_temporary_workspace:
+            return None
+
+        return await asyncio.to_thread(
+            self._copy_snapshots_to_temp_workspace_sync,
+            all_snapshots,
+            agent_mapping,
+        )
+
+    def _copy_snapshots_to_temp_workspace_sync(self, all_snapshots: dict[str, Path], agent_mapping: dict[str, str]) -> Path | None:
+        """Synchronous body of :meth:`copy_snapshots_to_temp_workspace`.
+
+        Runs on a worker thread (see that method). Each agent's
+        ``agent_temporary_workspace`` is distinct, so concurrent invocations on
+        different FilesystemManager instances touch disjoint directories.
         """
         if not self.agent_temporary_workspace:
             return None
diff --git a/massgen/orchestrator.py b/massgen/orchestrator.py
index 198d17dda..f15161145 100644
--- a/massgen/orchestrator.py
+++ b/massgen/orchestrator.py
@@ -196,6 +196,11 @@ class AgentState:
     # emitted in the current round that are pending merge into the orchestrator
     # accumulator at round transition. Each entry: {text, category, anti_patterns?}.
     criteria_proposals: list[dict[str, Any]] = field(default_factory=list)
+    # D2: set when per-round worktree isolation setup failed and the agent fell
+    # back to its base workspace (no per-round branch isolation). Surfaces the
+    # degradation that was previously only a log line.
+    round_isolation_degraded: bool = False
+    round_isolation_error: str | None = None
 
 
 class Orchestrator(ChatAgent):
@@ -2658,27 +2663,10 @@ def _coordination_complete() -> bool:
                                 agent_id,
                                 prefer_local_runtime_state=True,
                             )
-                            # Attach changedoc from workspace if enabled
-                            if self._is_changedoc_enabled() and agent and agent.backend.filesystem_manager:
-                                from massgen.changedoc import (
-                                    read_changedoc_from_workspace,
-                                )
-
-                                ws_path = agent.backend.filesystem_manager.cwd
-                                if ws_path:
-                                    changedoc_content = read_changedoc_from_workspace(Path(ws_path))
-                                    if changedoc_content:
-                                        answers_list = self.coordination_tracker.answers_by_agent.get(agent_id, [])
-                                        if answers_list:
-                                            label = answers_list[-1].label
-                                            # Replace [SELF] placeholder with real answer label
-                                            changedoc_content = changedoc_content.replace("[SELF]", label)
-                                            answers_list[-1].changedoc = changedoc_content
-                                            logger.info(
-                                                "[Orchestrator] Attached changedoc (%d chars) to %s",
-                                                len(changedoc_content),
-                                                answers_list[-1].label,
-                                            )
+                            # Attach changedoc from workspace if enabled (D3: guarded —
+                            # a changedoc/filesystem error must not kill an agent that
+                            # already recorded a valid answer above).
+                            self._attach_changedoc_to_latest_answer(agent_id, agent)
                             if self._is_decomposition_mode():
                                 self.agent_states[agent_id].decomposition_answer_streak += 1
                                 # Agent has produced a new self revision; keep its own seen
@@ -3548,156 +3536,11 @@ def _setup_hook_manager_for_agent(
         # Create mid-stream injection hook with closure-based callback
         mid_stream_hook = MidStreamInjectionHook()
 
-        # Define the injection callback (captures agent_id and answers)
-        # This is async to allow copying snapshots before injection
+        # Define the injection callback (captures agent_id and answers).
+        # A1: both the GeneralHookManager and native paths route through the
+        # single unified installer method so they can no longer drift.
         async def get_injection_content() -> str | None:
-            """Check if mid-stream injection is needed and return content."""
-            # Skip injection if disabled (multi-agent refinement OFF mode)
-            # Agents work independently without seeing each other's work
-            if self.config.disable_injection:
-                return None
-
-            if not self._check_restart_pending(agent_id):
-                return None
-
-            # First-answer protection: don't inject into an agent that hasn't
-            # produced its first answer yet.
-            if self._should_defer_restart_for_first_answer(agent_id):
-                self.agent_states[agent_id].restart_pending = False
-                return None
-
-            # In vote-only mode, skip injection and force a full restart instead.
-            # Mid-stream injection can't update tool schemas, so agents in vote-only mode
-            # wouldn't be able to vote for newly discovered answers (the vote enum is fixed
-            # at stream start). A full restart gives them updated tool schemas.
-            if self._is_vote_only_mode(agent_id):
-                return None  # Let restart happen instead
-
-            if self._should_defer_peer_updates_until_restart(agent_id):
-                if self._has_unseen_answer_updates(agent_id):
-                    self.agent_states[agent_id].restart_pending = True
-                    logger.info(
-                        "[Orchestrator] Deferring peer answer update injection until restart for %s",
-                        agent_id,
-                    )
-                else:
-                    self.agent_states[agent_id].restart_pending = False
-                return None
-
-            # Get CURRENT answers (includes virtual agents in step mode)
-            current_answers = self._get_current_answers_snapshot()
-            selected_answers, had_unseen_updates = self._select_midstream_answer_updates(
-                agent_id,
-                current_answers,
-            )
-
-            if not selected_answers:
-                if had_unseen_updates:
-                    # Keep restart pending when unseen updates still exist.
-                    self.agent_states[agent_id].restart_pending = True
-                    cap = getattr(self.config, "max_midstream_injections_per_round", 2)
-                    logger.info(
-                        "[Orchestrator] Skipping mid-stream injection for %s: per-round cap reached (%s)",
-                        agent_id,
-                        cap,
-                    )
-                else:
-                    # No unseen updates remain: this was a stale restart_pending flag.
-                    self.agent_states[agent_id].restart_pending = False
-                return None
-
-            # TIMING CONSTRAINT: Skip injection if too close to soft timeout
-            if self._should_skip_injection_due_to_timeout(agent_id):
-                return None  # Let restart happen instead
-
-            # Copy snapshots from new answer agents to temp workspace BEFORE building injection
-            # This ensures the workspace files are available when the agent tries to access them
-            logger.info(
-                f"[Orchestrator] Copying snapshots for mid-stream injection to {agent_id}",
-            )
-            await self._copy_all_snapshots_to_temp_workspace(agent_id)
-
-            # Build injection content (pass existing answers to detect updates vs new)
-            injection = self._build_tool_result_injection(
-                agent_id,
-                selected_answers,
-                existing_answers=answers,
-            )
-
-            # Debug: Log what's in the temp workspace for each injected agent
-            viewing_agent = self.agents.get(agent_id)
-            if viewing_agent and viewing_agent.backend.filesystem_manager:
-                temp_workspace_base = str(
-                    viewing_agent.backend.filesystem_manager.agent_temporary_workspace,
-                )
-                agent_mapping = self.coordination_tracker.get_reverse_agent_mapping()
-                for aid in selected_answers.keys():
-                    anon_id = agent_mapping.get(aid, f"agent_{aid}")
-                    workspace_path = os.path.join(temp_workspace_base, anon_id)
-                    if os.path.exists(workspace_path):
-                        try:
-                            files = os.listdir(workspace_path)
-                            logger.debug(
-                                f"[Orchestrator] Injection workspace {workspace_path} contains: {files}",
-                            )
-                        except OSError as e:
-                            logger.debug(
-                                f"[Orchestrator] Could not list workspace {workspace_path}: {e}",
-                            )
-                    else:
-                        logger.debug(
-                            f"[Orchestrator] Injection workspace {workspace_path} does NOT exist!",
-                        )
-
-            # Increment injection count
-            self.agent_states[agent_id].injection_count += 1
-            self.agent_states[agent_id].midstream_injections_this_round += len(selected_answers)
-
-            # Update answers to include newly injected answers (prevents re-injection)
-            # This mutates the captured closure variable so future callbacks see updated state
-            answers.update(selected_answers)
-
-            # Update known_answer_ids so vote validation knows this agent has seen these
-            self.agent_states[agent_id].known_answer_ids.update(selected_answers.keys())
-            self._register_injected_answer_updates(agent_id, list(selected_answers.keys()))
-            self._mark_pending_checklist_recheck_labels(agent_id, list(selected_answers.keys()))
-
-            # Keep restart pending if additional unseen revisions still remain.
-            self.agent_states[agent_id].restart_pending = self._has_unseen_answer_updates(agent_id)
-
-            # Track the injection
-            logger.info(
-                f"[Orchestrator] Mid-stream injection for {agent_id}: {len(selected_answers)} answer update(s)",
-            )
-            # Log the actual injection content at debug level (may contain sensitive data)
-            preview = injection[:2000] + ("..." if len(injection) > 2000 else "")
-            logger.debug(f"[Orchestrator] Injection content (truncated):\n{preview}")
-            self.coordination_tracker.track_agent_action(
-                agent_id,
-                ActionType.UPDATE_INJECTED,
-                f"Mid-stream: {len(selected_answers)} answer(s)",
-            )
-
-            # Emit injection_received event for TUI
-
-            _inj_emitter = get_event_emitter()
-            if _inj_emitter:
-                _inj_emitter.emit_injection_received(
-                    agent_id=agent_id,
-                    source_agents=list(selected_answers.keys()),
-                    injection_type="mid_stream",
-                )
-
-            # Update agent's context labels first, then refresh checklist state so
-            # available_agent_labels reflects the newly-injected labels (e.g. agent1.2
-            # replacing agent1.1). Refreshing before updating would leave stale labels.
-            self.coordination_tracker.update_agent_context_with_new_answers(
-                agent_id,
-                list(selected_answers.keys()),
-            )
-            self._refresh_checklist_state_for_agent(agent_id)
-
-            return injection
+            return await self._build_midstream_injection(agent_id, answers, native=False)
 
         # Set callback on hook
         mid_stream_hook.set_callback(get_injection_content)
@@ -4060,6 +3903,8 @@ async def _flush_codex_hook_payloads(
                         )
 
                         if selected_answers:
+                            # R1: capture peer revision counts before the snapshot-copy await.
+                            captured_revision_counts = self._capture_answer_revision_counts(list(selected_answers.keys()))
                             if not self._should_skip_injection_due_to_timeout(agent_id):
                                 await self._copy_all_snapshots_to_temp_workspace(agent_id)
 
@@ -4075,7 +3920,7 @@ async def _flush_codex_hook_payloads(
                                 self.agent_states[agent_id].midstream_injections_this_round += len(selected_answers)
                                 answers.update(selected_answers)
                                 self.agent_states[agent_id].known_answer_ids.update(selected_answers.keys())
-                                self._register_injected_answer_updates(agent_id, list(selected_answers.keys()))
+                                self._register_injected_answer_updates(agent_id, list(selected_answers.keys()), seen_counts=captured_revision_counts)
                                 self._mark_pending_checklist_recheck_labels(agent_id, list(selected_answers.keys()))
                                 # Update context labels BEFORE refreshing checklist state so
                                 # available_agent_labels reflects the newly-injected labels
@@ -4113,7 +3958,9 @@ async def _flush_codex_hook_payloads(
             combined = "\n".join(injection_parts)
             agent.backend.write_post_tool_use_hook(combined)
             if wrote_subagent_payload:
-                self._pending_subagent_results.pop(agent_id, None)
+                # R2: remove only the consumed results, not the whole key, so a
+                # background task that appended during the await window survives.
+                self._consume_pending_subagent_results(agent_id, {sid for sid, _ in pending})
             logger.info(
                 f"[Orchestrator] Wrote {len(combined)} chars to hook file for {agent_id} " f"({len(injection_parts)} parts)",
             )
@@ -4225,8 +4072,12 @@ async def _collect_no_hook_runtime_fallback_sections(
                 context={"agent_id": agent_id},
             )
             if subagent_result.inject and subagent_result.inject.get("content"):
-                # Delivery succeeded — now clear the source.
-                self._pending_subagent_results.pop(agent_id, None)
+                # Delivery succeeded — clear only the consumed results (R3), so a
+                # background result appended during the await window is preserved.
+                self._consume_pending_subagent_results(
+                    agent_id,
+                    {sid for sid, _ in pending_subagent_results},
+                )
                 sections.append(str(subagent_result.inject["content"]))
                 logger.info(
                     "[Orchestrator] Hookless subagent completion delivery status=%s (%s)",
@@ -4383,6 +4234,8 @@ async def _prepare_no_hook_midstream_enforcement(
         injection_parts: list[str] = []
 
         if selected_answers:
+            # R1: capture peer revision counts before the snapshot-copy await.
+            captured_revision_counts = self._capture_answer_revision_counts(list(selected_answers.keys()))
             logger.info(
                 "[Orchestrator] Delivering no-hook mid-stream peer updates via enforcement message for %s",
                 agent_id,
@@ -4407,7 +4260,7 @@ async def _prepare_no_hook_midstream_enforcement(
 
             # Mark the selected source revisions as seen by this agent.
             self.agent_states[agent_id].known_answer_ids.update(selected_answers.keys())
-            self._register_injected_answer_updates(agent_id, list(selected_answers.keys()))
+            self._register_injected_answer_updates(agent_id, list(selected_answers.keys()), seen_counts=captured_revision_counts)
             self._mark_pending_checklist_recheck_labels(agent_id, list(selected_answers.keys()))
 
             _inj_emitter = get_event_emitter()
@@ -4741,121 +4594,11 @@ def _setup_native_hooks_for_agent(
         # Create mid-stream injection hook with closure-based callback
         mid_stream_hook = MidStreamInjectionHook()
 
-        # Define the injection callback (same logic as GeneralHookManager path)
+        # Define the injection callback (A1: unified with the GeneralHookManager
+        # path via the single installer method; native=True only changes log
+        # wording and the non-native debug listing).
         async def get_injection_content() -> str | None:
-            """Check if mid-stream injection is needed and return content."""
-            if self.config.disable_injection:
-                return None
-
-            if not self._check_restart_pending(agent_id):
-                return None
-
-            # First-answer protection: don't inject into an agent that hasn't
-            # produced its first answer yet.
-            if self._should_defer_restart_for_first_answer(agent_id):
-                self.agent_states[agent_id].restart_pending = False
-                return None
-
-            # In vote-only mode, skip injection and force a full restart instead.
-            if self._is_vote_only_mode(agent_id):
-                return None
-
-            if self._should_defer_peer_updates_until_restart(agent_id):
-                if self._has_unseen_answer_updates(agent_id):
-                    self.agent_states[agent_id].restart_pending = True
-                    logger.info(
-                        "[Orchestrator] Deferring native peer answer update injection until restart for %s",
-                        agent_id,
-                    )
-                else:
-                    self.agent_states[agent_id].restart_pending = False
-                return None
-
-            # Get CURRENT answers (includes virtual agents in step mode)
-            current_answers = self._get_current_answers_snapshot()
-            selected_answers, had_unseen_updates = self._select_midstream_answer_updates(
-                agent_id,
-                current_answers,
-            )
-
-            if not selected_answers:
-                if had_unseen_updates:
-                    self.agent_states[agent_id].restart_pending = True
-                    cap = getattr(self.config, "max_midstream_injections_per_round", 2)
-                    logger.info(
-                        "[Orchestrator] Skipping native mid-stream injection for %s: per-round cap reached (%s)",
-                        agent_id,
-                        cap,
-                    )
-                else:
-                    self.agent_states[agent_id].restart_pending = False
-                return None
-
-            # TIMING CONSTRAINT: Skip injection if too close to soft timeout
-            if self._should_skip_injection_due_to_timeout(agent_id):
-                return None  # Let restart happen instead
-
-            # Copy snapshots from new answer agents to temp workspace
-            logger.info(
-                f"[Orchestrator] Copying snapshots for mid-stream injection to {agent_id}",
-            )
-            await self._copy_all_snapshots_to_temp_workspace(agent_id)
-
-            # Build injection content
-            injection = self._build_tool_result_injection(
-                agent_id,
-                selected_answers,
-                existing_answers=answers,
-            )
-
-            # Increment injection count
-            self.agent_states[agent_id].injection_count += 1
-            self.agent_states[agent_id].midstream_injections_this_round += len(selected_answers)
-
-            # Update answers to include newly injected answers (prevents re-injection)
-            # This mutates the captured closure variable so future callbacks see updated state
-            answers.update(selected_answers)
-
-            # Update known_answer_ids so vote validation knows this agent has seen these
-            self.agent_states[agent_id].known_answer_ids.update(selected_answers.keys())
-            self._register_injected_answer_updates(agent_id, list(selected_answers.keys()))
-            self._mark_pending_checklist_recheck_labels(agent_id, list(selected_answers.keys()))
-
-            # Update context labels BEFORE refreshing checklist state so
-            # available_agent_labels reflects the newly-injected labels
-            # (e.g. agent1.2 replacing agent1.1). Same ordering as all other paths.
-            self.coordination_tracker.update_agent_context_with_new_answers(
-                agent_id,
-                list(selected_answers.keys()),
-            )
-
-            # Refresh checklist tool state after injection (streak may have reset)
-            self._refresh_checklist_state_for_agent(agent_id)
-
-            # Keep restart pending if additional unseen revisions still remain.
-            self.agent_states[agent_id].restart_pending = self._has_unseen_answer_updates(agent_id)
-
-            # Emit injection_received event for TUI
-
-            _inj_emitter = get_event_emitter()
-            if _inj_emitter:
-                _inj_emitter.emit_injection_received(
-                    agent_id=agent_id,
-                    source_agents=list(selected_answers.keys()),
-                    injection_type="mid_stream",
-                )
-
-            # Track the injection
-            logger.info(
-                f"[Orchestrator] Mid-stream injection (native) for {agent_id}: {len(selected_answers)} answer update(s)",
-            )
-            self.coordination_tracker.track_agent_action(
-                agent_id,
-                ActionType.UPDATE_INJECTED,
-                f"Mid-stream (native): {len(selected_answers)} answer(s)",
-            )
-
-            return injection
+            return await self._build_midstream_injection(agent_id, answers, native=True)
 
         # Set callback on hook
         mid_stream_hook.set_callback(get_injection_content)
@@ -5204,9 +4947,18 @@ def _sync_decomposition_answer_visibility(self, agent_id: str) -> None:
         """Update seen-answer revision snapshot (delegates to PeerAnswerVisibilityTracker)."""
         self._peer_answer_visibility_tracker.sync_decomposition_answer_visibility(agent_id)
 
-    def _mark_seen_answer_revisions(self, agent_id: str, source_agent_ids: list[str]) -> None:
+    def _mark_seen_answer_revisions(
+        self,
+        agent_id: str,
+        source_agent_ids: list[str],
+        seen_counts: dict[str, int] | None = None,
+    ) -> None:
         """Mark seen answer revisions (delegates to PeerAnswerVisibilityTracker)."""
-        self._peer_answer_visibility_tracker.mark_seen_answer_revisions(agent_id, source_agent_ids)
+        self._peer_answer_visibility_tracker.mark_seen_answer_revisions(
+            agent_id,
+            source_agent_ids,
+            seen_counts=seen_counts,
+        )
 
     def _get_latest_answer_revision_timestamp(self, source_agent_id: str) -> float:
         """Get latest revision timestamp (delegates to PeerAnswerVisibilityTracker)."""
@@ -5258,13 +5010,116 @@ def _mark_pending_checklist_recheck_labels(
             source_agent_ids,
         )
 
-    def _register_injected_answer_updates(self, agent_id: str, source_agent_ids: list[str]) -> None:
+    def _register_injected_answer_updates(
+        self,
+        agent_id: str,
+        source_agent_ids: list[str],
+        seen_counts: dict[str, int] | None = None,
+    ) -> None:
         """Apply post-injection state updates (delegates to PeerAnswerVisibilityTracker)."""
         self._peer_answer_visibility_tracker.register_injected_answer_updates(
             agent_id,
             source_agent_ids,
+            seen_counts=seen_counts,
+        )
+
+    def _capture_answer_revision_counts(self, source_agent_ids: list[str]) -> dict[str, int]:
+        """Snapshot per-source answer revision counts at injection-selection time (R1).
+
+        Capturing the counts before the snapshot-copy ``await`` lets
+        ``_register_injected_answer_updates`` mark the agent seen up to exactly
+        what it was shown, even if a peer publishes a new revision during the await.
+        """
+        return {source_id: self._get_agent_answer_revision_count(source_id) for source_id in source_agent_ids}
+
+    def _consume_pending_subagent_results(self, agent_id: str, consumed_subagent_ids: set[str]) -> None:
+        """Remove only delivered subagent results, preserving concurrent appends (R2/R3).
+
+        Delegates to SubagentLifecycleCoordinator.
+        """
+        self._subagent_lifecycle_coordinator.consume_pending_subagent_results(
+            agent_id,
+            consumed_subagent_ids,
         )
 
+    async def _build_midstream_injection(self, agent_id: str, answers: dict[str, str], *, native: bool) -> str | None:
+        """Unified mid-stream peer-answer injection callback (A1).
+
+        Delegates to MidStreamInjectionHookInstaller. Both the GeneralHookManager
+        and native hook paths route their injection callback through here, so the
+        two backend paths can no longer drift.
+        """
+        return await self._midstream_injection_hook_installer.build_midstream_injection(
+            agent_id,
+            answers,
+            native=native,
+        )
+
+    def _attach_changedoc_to_latest_answer(self, agent_id: str, agent: Any) -> None:
+        """Attach the workspace changedoc to the agent's latest recorded answer (D3).
+
+        This is non-essential post-record enrichment that runs AFTER
+        ``add_agent_answer`` has already persisted the answer. Any failure (e.g.
+        filesystem error reading the changedoc) is logged and swallowed so a
+        bookkeeping bug cannot mark a valid-answer agent as killed.
+        """
+        try:
+            if not (self._is_changedoc_enabled() and agent and agent.backend.filesystem_manager):
+                return
+            from massgen.changedoc import read_changedoc_from_workspace
+
+            ws_path = agent.backend.filesystem_manager.cwd
+            if not ws_path:
+                return
+            changedoc_content = read_changedoc_from_workspace(Path(ws_path))
+            if not changedoc_content:
+                return
+            answers_list = self.coordination_tracker.answers_by_agent.get(agent_id, [])
+            if not answers_list:
+                return
+            label = answers_list[-1].label
+            # Replace [SELF] placeholder with real answer label
+            changedoc_content = changedoc_content.replace("[SELF]", label)
+            answers_list[-1].changedoc = changedoc_content
+            logger.info(
+                "[Orchestrator] Attached changedoc (%d chars) to %s",
+                len(changedoc_content),
+                answers_list[-1].label,
+            )
+        except Exception:
+            logger.warning(
+                "[Orchestrator] Failed to attach changedoc for %s (answer preserved)",
+                agent_id,
+                exc_info=True,
+            )
+
+    def _record_round_isolation_degraded(self, agent_id: str, error: BaseException) -> None:
+        """Record + surface that per-round worktree isolation failed for an agent (D2).
+
+        Sets observable state on the agent's AgentState (instead of swallowing the
+        failure into a single log line) and, when an event emitter is available,
+        emits a visible signal. The agent continues in its base workspace.
+        """
+        error_text = f"{type(error).__name__}: {error}"
+        state = self.agent_states.get(agent_id)
+        if state is not None:
+            state.round_isolation_degraded = True
+            state.round_isolation_error = error_text
+        logger.warning(
+            "[Orchestrator] Per-round worktree isolation FAILED for %s — continuing in base " "workspace without per-round branch isolation: %s",
+            agent_id,
+            error_text,
+        )
+        try:
+            emitter = get_event_emitter()
+            if emitter is not None and hasattr(emitter, "emit_status"):
+                emitter.emit_status(
+                    agent_id=agent_id,
+                    status=f"round isolation degraded: {error_text}",
+                )
+        except Exception:
+            logger.debug("[Orchestrator] Unable to emit round-isolation degraded status", exc_info=True)
+
     def _check_fairness_answer_lead_cap(self, agent_id: str) -> tuple[bool, str | None]:
         """Enforce max lead in answer revisions (delegates to FairnessGate)."""
         return self._fairness_gate.check_fairness_answer_lead_cap(agent_id)
@@ -6080,7 +5935,11 @@ async def _stream_agent_execution(
 
             # Re-write SUBAGENT.md dirs and MCP config JSON files each round so
             # the lazy scanner / deferred loader in the MCP server always finds
-            # them — workspace clears between rounds can remove .massgen/.
+            # them. NOTE (B5, 2026-06-04): the original "workspace clears remove
+            # .massgen/" rationale is now stale — clear_workspace() is disabled
+            # above and preserves .massgen/ regardless. The rewrite is kept (not
+            # guarded) pending verification that it is per-round idempotent for
+            # all backends; the cost is a few ms over ~7 subagent configs.
             if hasattr(self.config, "coordination_config") and getattr(self.config.coordination_config, "enable_subagents", False):
                 workspace_root = agent.backend.filesystem_manager.get_workspace_root()
                 self._write_subagent_type_dirs(workspace_root)
@@ -6214,7 +6073,12 @@ async def _stream_agent_execution(
                         f"[Orchestrator] Created per-round worktree for {agent_id}: {round_worktree_paths}",
                     )
             except Exception as e:
-                logger.warning(f"[Orchestrator] Failed to create per-round worktree for {agent_id}: {e}")
+                # D2: per-round worktree isolation failed. The agent keeps its own
+                # base workspace (agents always have separate cwds), so we degrade
+                # gracefully rather than aborting the round on a transient git error
+                # — but we now RECORD the degradation on AgentState and emit a
+                # visible signal instead of only writing a log line.
+                self._record_round_isolation_degraded(agent_id, e)
                 round_worktree_paths = None
 
         # Track outcome for span attributes (set in finally block)
diff --git a/massgen/orchestrator_collaborators/active_coordination_cleanup.py b/massgen/orchestrator_collaborators/active_coordination_cleanup.py
index f90ca0b92..4845753c0 100644
--- a/massgen/orchestrator_collaborators/active_coordination_cleanup.py
+++ b/massgen/orchestrator_collaborators/active_coordination_cleanup.py
@@ -35,6 +35,21 @@ async def cleanup(self) -> None:
                 logger.warning(f"[Orchestrator] Error stopping SubagentLaunchWatcher: {e}")
             orch._subagent_launch_watcher = None
 
+        # R4: cancel detached background trace-analyzer tasks BEFORE flushing, so a
+        # surviving task cannot append a result after the flush into a queue that
+        # will never be consumed (and so it doesn't outlive the hard timeout). The
+        # task's CancelledError path returns without writing, so awaiting the
+        # cancellation fully closes the window.
+        if getattr(orch, "_background_trace_tasks", None):
+            for _agent_id, trace_task in list(orch._background_trace_tasks.items()):
+                if not trace_task.done():
+                    trace_task.cancel()
+                    try:
+                        await trace_task
+                    except (asyncio.CancelledError, Exception):
+                        pass
+            orch._background_trace_tasks.clear()
+
         # Flush any pending subagent results that weren't delivered.
         orch._flush_pending_subagent_results()
 
diff --git a/massgen/orchestrator_collaborators/answer_text_normalizer.py b/massgen/orchestrator_collaborators/answer_text_normalizer.py
index e5d5279e1..e5557c757 100644
--- a/massgen/orchestrator_collaborators/answer_text_normalizer.py
+++ b/massgen/orchestrator_collaborators/answer_text_normalizer.py
@@ -105,15 +105,25 @@ def normalize_workspace_paths_in_answers(
                 other_workspace = str(
                     other_agent.backend.filesystem_manager.get_current_workspace(),
                 )
+                # C2: use loguru brace-style deferred formatting instead of eager
+                # f-strings. These logs interpolate the full answer body on every
+                # (answer x agent) pair; with f-strings Python builds the multi-KB
+                # string even when no DEBUG sink is attached. With brace args, loguru
+                # only formats when a handler actually accepts the record.
                 logger.debug(
-                    f"[Orchestrator._normalize_workspace_paths_in_answers] Replacing {other_workspace} in answer from {agent_id} with path {replace_path}. original answer: {normalized_answer}",
+                    "[Orchestrator._normalize_workspace_paths_in_answers] Replacing {} in answer from {} with path {}. original answer: {}",
+                    other_workspace,
+                    agent_id,
+                    replace_path,
+                    normalized_answer,
                 )
                 normalized_answer = normalized_answer.replace(
                     other_workspace,
                     replace_path,
                 )
                 logger.debug(
-                    f"[Orchestrator._normalize_workspace_paths_in_answers] Intermediate normalized answer: {normalized_answer}",
+                    "[Orchestrator._normalize_workspace_paths_in_answers] Intermediate normalized answer: {}",
+                    normalized_answer,
                 )
 
             normalized_answers[agent_id] = normalized_answer
diff --git a/massgen/orchestrator_collaborators/midstream_injection_hook_installer.py b/massgen/orchestrator_collaborators/midstream_injection_hook_installer.py
index c30d92d29..ce4eeae6e 100644
--- a/massgen/orchestrator_collaborators/midstream_injection_hook_installer.py
+++ b/massgen/orchestrator_collaborators/midstream_injection_hook_installer.py
@@ -20,7 +20,8 @@
 from pathlib import Path
 from typing import TYPE_CHECKING, Any
 
-from massgen.logger_config import logger
+from massgen.logger_config import get_event_emitter, logger
+from massgen.utils import ActionType
 
 if TYPE_CHECKING:
     from massgen.orchestrator import Orchestrator
@@ -143,6 +144,196 @@ def compute_plan_progress_stats(self, workspace_path: str) -> dict[str, Any] | N
             logger.debug(f"[Orchestrator] Could not compute plan progress: {e}")
             return None
 
+    async def build_midstream_injection(
+        self,
+        agent_id: str,
+        answers: dict[str, str],
+        *,
+        native: bool,
+    ) -> str | None:
+        """Unified mid-stream peer-answer injection callback (A1).
+
+        Replaces the two near-identical ``get_injection_content`` closures that
+        lived inline in ``_setup_hook_manager_for_agent`` (GeneralHookManager
+        path) and ``_setup_native_hooks_for_agent`` (native path). Keeping them
+        separate was a backend-parity hazard — a fix to one path silently skipped
+        the other.
+
+        ``native`` only affects log/track wording and the (non-native) debug
+        workspace-listing output. The side-effect sequence is canonical and
+        preserves the load-bearing invariant shared by both original closures:
+        ``update_agent_context_with_new_answers`` runs BEFORE
+        ``refresh_checklist_state_for_agent`` so ``available_agent_labels``
+        reflect the newly-injected labels. (The prior inter-path divergence in
+        the position of the ``restart_pending`` recompute and the emitter/track
+        calls was verified inert — the recompute reads only ``seen_answer_counts``,
+        which is set at the same relative position in both paths.)
+
+        Mutates the caller's ``answers`` dict in place so re-entrant callbacks do
+        not re-inject the same updates.
+        """
+        orch = self._orchestrator
+        label = "native " if native else ""
+        track_suffix = " (native)" if native else ""
+
+        # Skip injection if disabled (multi-agent refinement OFF mode).
+        if orch.config.disable_injection:
+            return None
+
+        if not orch._check_restart_pending(agent_id):
+            return None
+
+        # First-answer protection: don't inject before the agent's first answer.
+        if orch._should_defer_restart_for_first_answer(agent_id):
+            orch.agent_states[agent_id].restart_pending = False
+            return None
+
+        # In vote-only mode, force a full restart instead (mid-stream injection
+        # can't update the fixed vote tool schema).
+        if orch._is_vote_only_mode(agent_id):
+            return None
+
+        if orch._should_defer_peer_updates_until_restart(agent_id):
+            if orch._has_unseen_answer_updates(agent_id):
+                orch.agent_states[agent_id].restart_pending = True
+                logger.info(
+                    "[Orchestrator] Deferring %speer answer update injection until restart for %s",
+                    label,
+                    agent_id,
+                )
+            else:
+                orch.agent_states[agent_id].restart_pending = False
+            return None
+
+        # Get CURRENT answers (includes virtual agents in step mode).
+        current_answers = orch._get_current_answers_snapshot()
+        selected_answers, had_unseen_updates = orch._select_midstream_answer_updates(
+            agent_id,
+            current_answers,
+        )
+
+        if not selected_answers:
+            if had_unseen_updates:
+                # Keep restart pending when unseen updates still exist.
+                orch.agent_states[agent_id].restart_pending = True
+                cap = getattr(orch.config, "max_midstream_injections_per_round", 2)
+                logger.info(
+                    "[Orchestrator] Skipping %smid-stream injection for %s: per-round cap reached (%s)",
+                    label,
+                    agent_id,
+                    cap,
+                )
+            else:
+                # No unseen updates remain: this was a stale restart_pending flag.
+                orch.agent_states[agent_id].restart_pending = False
+            return None
+
+        # R1: capture peer revision counts as-of selection, before the
+        # snapshot-copy await below can let a peer publish a new revision that
+        # would otherwise be silently marked "seen".
+        captured_revision_counts = orch._capture_answer_revision_counts(list(selected_answers.keys()))
+
+        # TIMING CONSTRAINT: skip injection if too close to soft timeout.
+        if orch._should_skip_injection_due_to_timeout(agent_id):
+            return None
+
+        # Copy snapshots from new-answer agents to temp workspace BEFORE building
+        # the injection, so the workspace files are available to the agent.
+        logger.info(
+            "[Orchestrator] Copying snapshots for mid-stream injection to %s",
+            agent_id,
+        )
+        await orch._copy_all_snapshots_to_temp_workspace(agent_id)
+
+        # Build injection content (pass existing answers to detect updates vs new).
+        injection = orch._build_tool_result_injection(
+            agent_id,
+            selected_answers,
+            existing_answers=answers,
+        )
+
+        # Debug: log temp-workspace contents per injected agent (non-native path
+        # only, preserved from the original GeneralHookManager closure).
+        if not native:
+            viewing_agent = orch.agents.get(agent_id)
+            if viewing_agent and viewing_agent.backend.filesystem_manager:
+                temp_workspace_base = str(
+                    viewing_agent.backend.filesystem_manager.agent_temporary_workspace,
+                )
+                agent_mapping = orch.coordination_tracker.get_reverse_agent_mapping()
+                for aid in selected_answers.keys():
+                    anon_id = agent_mapping.get(aid, f"agent_{aid}")
+                    workspace_path = os.path.join(temp_workspace_base, anon_id)
+                    if os.path.exists(workspace_path):
+                        try:
+                            files = os.listdir(workspace_path)
+                            logger.debug(
+                                f"[Orchestrator] Injection workspace {workspace_path} contains: {files}",
+                            )
+                        except OSError as e:
+                            logger.debug(
+                                f"[Orchestrator] Could not list workspace {workspace_path}: {e}",
+                            )
+                    else:
+                        logger.debug(
+                            f"[Orchestrator] Injection workspace {workspace_path} does NOT exist!",
+                        )
+
+        # Increment injection counters.
+        orch.agent_states[agent_id].injection_count += 1
+        orch.agent_states[agent_id].midstream_injections_this_round += len(selected_answers)
+
+        # Update answers so future callbacks don't re-inject the same updates.
+        answers.update(selected_answers)
+
+        # Update known_answer_ids so vote validation knows the agent saw these.
+        orch.agent_states[agent_id].known_answer_ids.update(selected_answers.keys())
+        orch._register_injected_answer_updates(
+            agent_id,
+            list(selected_answers.keys()),
+            seen_counts=captured_revision_counts,
+        )
+        orch._mark_pending_checklist_recheck_labels(agent_id, list(selected_answers.keys()))
+
+        # Update agent context labels BEFORE refreshing checklist state so
+        # available_agent_labels reflects the newly-injected labels (e.g.
+        # agent1.2 replacing agent1.1). Load-bearing ordering — both original
+        # closures preserved it.
+        orch.coordination_tracker.update_agent_context_with_new_answers(
+            agent_id,
+            list(selected_answers.keys()),
+        )
+        orch._refresh_checklist_state_for_agent(agent_id)
+
+        # Keep restart pending if additional unseen revisions still remain.
+        orch.agent_states[agent_id].restart_pending = orch._has_unseen_answer_updates(agent_id)
+
+        logger.info(
+            "[Orchestrator] Mid-stream injection%s for %s: %d answer update(s)",
+            track_suffix,
+            agent_id,
+            len(selected_answers),
+        )
+        if not native:
+            preview = injection[:2000] + ("..." if len(injection) > 2000 else "")
+            logger.debug(f"[Orchestrator] Injection content (truncated):\n{preview}")
+
+        _inj_emitter = get_event_emitter()
+        if _inj_emitter:
+            _inj_emitter.emit_injection_received(
+                agent_id=agent_id,
+                source_agents=list(selected_answers.keys()),
+                injection_type="mid_stream",
+            )
+
+        orch.coordination_tracker.track_agent_action(
+            agent_id,
+            ActionType.UPDATE_INJECTED,
+            f"Mid-stream{track_suffix}: {len(selected_answers)} answer(s)",
+        )
+
+        return injection
+
     def build_tool_result_injection(
         self,
         agent_id: str,
diff --git a/massgen/orchestrator_collaborators/peer_answer_visibility_tracker.py b/massgen/orchestrator_collaborators/peer_answer_visibility_tracker.py
index 31ceff43f..f2bd08dea 100644
--- a/massgen/orchestrator_collaborators/peer_answer_visibility_tracker.py
+++ b/massgen/orchestrator_collaborators/peer_answer_visibility_tracker.py
@@ -83,14 +83,34 @@ def sync_decomposition_answer_visibility(self, agent_id: str) -> None:
 
         state.seen_answer_counts = current_counts
 
-    def mark_seen_answer_revisions(self, agent_id: str, source_agent_ids: list[str]) -> None:
-        """Mark current answer revisions from source agents as seen by ``agent_id``."""
+    def mark_seen_answer_revisions(
+        self,
+        agent_id: str,
+        source_agent_ids: list[str],
+        seen_counts: dict[str, int] | None = None,
+    ) -> None:
+        """Mark answer revisions from source agents as seen by ``agent_id``.
+
+        ``seen_counts`` (R1 fix): per-source revision counts *captured at the
+        moment the injected content was selected*. When provided, the source is
+        marked seen up to that captured count rather than the live count, so a
+        peer revision published during the intervening ``await`` (e.g. snapshot
+        copy) is NOT silently marked seen and remains injectable. The captured
+        count is clamped to the current count and never lowers an already-higher
+        seen count. When omitted, falls back to the legacy live read.
+        """
         orch = self._orchestrator
         state = orch.agent_states.get(agent_id)
         if not state:
             return
         for source_agent_id in source_agent_ids:
-            state.seen_answer_counts[source_agent_id] = self.get_agent_answer_revision_count(source_agent_id)
+            current = self.get_agent_answer_revision_count(source_agent_id)
+            if seen_counts is not None and source_agent_id in seen_counts:
+                count = min(int(seen_counts[source_agent_id]), current)
+            else:
+                count = current
+            prev = state.seen_answer_counts.get(source_agent_id, 0)
+            state.seen_answer_counts[source_agent_id] = max(prev, count)
 
     def get_latest_answer_revision_timestamp(self, source_agent_id: str) -> float:
         """Get timestamp of the latest answer revision for an agent."""
@@ -273,8 +293,15 @@ def register_injected_answer_updates(
         self,
         agent_id: str,
         source_agent_ids: list[str],
+        seen_counts: dict[str, int] | None = None,
     ) -> None:
-        """Apply per-agent state updates after mid-stream answer injection."""
+        """Apply per-agent state updates after mid-stream answer injection.
+
+        ``seen_counts`` is forwarded to :meth:`mark_seen_answer_revisions` (R1
+        fix): pass the revision counts captured when the injected content was
+        selected so a peer revision published during the intervening ``await``
+        is not marked seen. See that method for details.
+        """
         orch = self._orchestrator
         state = orch.agent_states.get(agent_id)
         if not state or not source_agent_ids:
@@ -288,4 +315,4 @@ def register_injected_answer_updates(
             )
             state.decomposition_answer_streak = 0
 
-        self.mark_seen_answer_revisions(agent_id, source_agent_ids)
+        self.mark_seen_answer_revisions(agent_id, source_agent_ids, seen_counts=seen_counts)
diff --git a/massgen/orchestrator_collaborators/subagent_lifecycle_coordinator.py b/massgen/orchestrator_collaborators/subagent_lifecycle_coordinator.py
index 3d7e8c2ed..062890bc3 100644
--- a/massgen/orchestrator_collaborators/subagent_lifecycle_coordinator.py
+++ b/massgen/orchestrator_collaborators/subagent_lifecycle_coordinator.py
@@ -193,6 +193,30 @@ def get_pending_subagent_results(
 
         return run_async_safely(self.get_pending_subagent_results_async(agent_id))
 
+    def consume_pending_subagent_results(
+        self,
+        agent_id: str,
+        consumed_subagent_ids: set[str],
+    ) -> None:
+        """Remove only the just-delivered subagent results, preserving late appends.
+
+        R2/R3 fix: the injection paths previously did
+        ``_pending_subagent_results.pop(agent_id, None)`` after an ``await`` window,
+        which discarded any result a background task appended during that window.
+        This removes only the entries whose ``subagent_id`` was actually consumed,
+        dropping the key only when nothing remains. A result that arrived
+        concurrently survives for the next injection cycle.
+        """
+        orch = self._orchestrator
+        existing = orch._pending_subagent_results.get(agent_id)
+        if existing is None:
+            return
+        remaining = [entry for entry in existing if entry[0] not in consumed_subagent_ids]
+        if remaining:
+            orch._pending_subagent_results[agent_id] = remaining
+        else:
+            orch._pending_subagent_results.pop(agent_id, None)
+
     # ------------------------------------------------------------------
     # Cancellation flows
     # ------------------------------------------------------------------
diff --git a/massgen/subagent/manager.py b/massgen/subagent/manager.py
index c7bbfa595..430d88368 100644
--- a/massgen/subagent/manager.py
+++ b/massgen/subagent/manager.py
@@ -4187,12 +4187,21 @@ async def cancel_all_subagents(self) -> int:
         self._active_processes.clear()
 
         # Also cancel any background tasks
+        cancelled_background_tasks: list = []
         for task_id, task in list(self._background_tasks.items()):
             if not task.done():
                 logger.warning(f"[SubagentManager] Cancelling background task {task_id}...")
                 task.cancel()
                 cancelled_count += 1
                 cancelled_subagent_ids.add(task_id)
+                cancelled_background_tasks.append(task)
+
+        # R5: await the cancellations so each task runs its CancelledError handler
+        # (cancelled-status preservation + idempotent self-pop) against the LIVE
+        # registry, before we clear it — instead of resuming after this method
+        # returns and operating on an already-cleared dict.
+        if cancelled_background_tasks:
+            await asyncio.gather(*cancelled_background_tasks, return_exceptions=True)
 
         self._background_tasks.clear()
 
diff --git a/massgen/tests/test_answer_normalizer_debug_guard.py b/massgen/tests/test_answer_normalizer_debug_guard.py
new file mode 100644
index 000000000..75a3305b2
--- /dev/null
+++ b/massgen/tests/test_answer_normalizer_debug_guard.py
@@ -0,0 +1,89 @@
+"""C2 regression: the workspace-path normalizer must not eagerly build debug
+strings when no DEBUG sink is attached.
+
+The two debug logs in ``normalize_workspace_paths_in_answers`` interpolate the
+full answer body on every (answer x agent) pair. They now use loguru brace-style
+deferred formatting, so loguru only renders the message when a handler actually
+accepts the record. These tests assert (a) no formatting work happens without a
+DEBUG sink, (b) the message renders correctly when a sink is attached, and
+(c) functional path-rewriting output is preserved.
+"""
+
+from __future__ import annotations
+
+import types
+
+from massgen.logger_config import logger
+from massgen.orchestrator_collaborators.answer_text_normalizer import (
+    AnswerTextNormalizer,
+)
+
+
+def _make_orch(workspace: str) -> types.SimpleNamespace:
+    fs = types.SimpleNamespace(
+        agent_temporary_workspace="/tmp/agentview",
+        get_current_workspace=lambda: workspace,
+    )
+    backend = types.SimpleNamespace(filesystem_manager=fs)
+    agent = types.SimpleNamespace(backend=backend)
+    orch = types.SimpleNamespace()
+    orch.agents = {"peer": agent, "viewer": agent}
+    orch.coordination_tracker = types.SimpleNamespace(
+        get_reverse_agent_mapping=lambda: {"peer": "agent_1", "viewer": "agent_2"},
+    )
+    return orch
+
+
+def test_c2_output_preserved_and_path_rewritten() -> None:
+    orch = _make_orch("/work/peer")
+    normalizer = AnswerTextNormalizer(orch)
+
+    out = normalizer.normalize_workspace_paths_in_answers(
+        {"peer": "see /work/peer/file.py"},
+        viewing_agent_id="viewer",
+    )
+
+    assert "/work/peer" not in out["peer"], "workspace path was not rewritten"
+
+
+def test_c2_message_renders_when_sink_attached() -> None:
+    """With a DEBUG sink, the deferred message renders with the substituted args."""
+    orch = _make_orch("/work/peer")
+    normalizer = AnswerTextNormalizer(orch)
+
+    captured: list[str] = []
+    sink_id = logger.add(lambda m: captured.append(str(m)), level="DEBUG")
+    try:
+        normalizer.normalize_workspace_paths_in_answers(
+            {"peer": "see /work/peer/file.py"},
+            viewing_agent_id="viewer",
+        )
+    finally:
+        logger.remove(sink_id)
+
+    joined = "\n".join(captured)
+    assert "Replacing /work/peer" in joined
+    assert "original answer: see /work/peer/file.py" in joined
+
+
+def test_c2_answer_body_not_formatted_without_debug_sink() -> None:
+    """Without a DEBUG sink, the answer object is never formatted (deferred)."""
+
+    class _Tripwire(str):
+        formats = 0
+
+        def __format__(self, spec: str) -> str:  # pragma: no cover - asserted via counter
+            type(self).formats += 1
+            return str.__format__(self, spec)
+
+    _Tripwire.formats = 0
+    orch = _make_orch("/work/peer")
+    normalizer = AnswerTextNormalizer(orch)
+
+    # No DEBUG sink attached: loguru must not format the (expensive) answer arg.
+    normalizer.normalize_workspace_paths_in_answers(
+        {"peer": _Tripwire("see /work/peer/file.py")},
+        viewing_agent_id="viewer",
+    )
+
+    assert _Tripwire.formats == 0, "answer body was formatted despite no DEBUG sink (eager logging)"
diff --git a/massgen/tests/test_concurrency_race_fixes.py b/massgen/tests/test_concurrency_race_fixes.py
new file mode 100644
index 000000000..33a149f81
--- /dev/null
+++ b/massgen/tests/test_concurrency_race_fixes.py
@@ -0,0 +1,406 @@
+"""Concurrency race-fix regression tests (Tranche 1 of next_version_eng_health_plan.md).
+
+Covers the two verified, lock-free races from the concurrency audit:
+
+R1 — Lost peer-answer revision: the mid-stream injection path selected peer
+     answer *content* before a yielding ``await`` (snapshot copy), then marked
+     the peer as "seen" by re-reading the **live** revision count afterward. If
+     the peer appended a new revision during the await, the agent was marked as
+     having seen a revision it was never shown -> that revision is never injected.
+     Fix: thread the revision counts captured *at selection time* through
+     ``register_injected_answer_updates`` / ``mark_seen_answer_revisions`` via an
+     optional ``seen_counts`` argument.
+
+R2/R3 — Lost background-subagent result: the injection paths did
+     read-snapshot -> ``await`` -> blind ``_pending_subagent_results.pop(agent_id)``.
+     A background task appending a fresh result during the await had it silently
+     discarded by the whole-key pop. Fix: ``consume_pending_subagent_results``
+     removes only the consumed subagent ids, preserving concurrent appends.
+
+These tests drive the REAL collaborator code (PeerAnswerVisibilityTracker,
+SubagentLifecycleCoordinator) against a lightweight fake orchestrator -- no
+external LLM calls, fully deterministic. The "simulation" tests reproduce the
+exact interleaving by appending to shared state between the collect and the
+consume/mark, the same window the real ``await`` opens.
+"""
+
+from __future__ import annotations
+
+import asyncio
+import types
+
+import pytest
+
+from massgen.orchestrator import AgentState
+from massgen.orchestrator_collaborators.peer_answer_visibility_tracker import (
+    PeerAnswerVisibilityTracker,
+)
+from massgen.orchestrator_collaborators.subagent_lifecycle_coordinator import (
+    SubagentLifecycleCoordinator,
+)
+
+
+# --------------------------------------------------------------------------- #
+# Test doubles
+# --------------------------------------------------------------------------- #
+class _Revision:
+    """Minimal stand-in for a coordination answer revision."""
+
+    def __init__(self, content: str, timestamp: float, label: str | None = None) -> None:
+        self.content = content
+        self.timestamp = timestamp
+        self.label = label
+
+
+class _FakeCoordTracker:
+    def __init__(self) -> None:
+        self.answers_by_agent: dict[str, list[_Revision]] = {}
+
+
+def _make_peer_orch(agent_ids: list[str]) -> types.SimpleNamespace:
+    """A fake orchestrator exposing just what PeerAnswerVisibilityTracker reads."""
+    orch = types.SimpleNamespace()
+    orch.coordination_tracker = _FakeCoordTracker()
+    orch.agent_states = {aid: AgentState() for aid in agent_ids}
+    orch.agents = {aid: types.SimpleNamespace(backend=None) for aid in agent_ids}
+    orch.config = types.SimpleNamespace(max_midstream_injections_per_round=2)
+    orch._step_mode = None
+    orch._step_inputs = None
+    orch._is_decomposition_mode = lambda: False
+    orch._is_fairness_enabled = lambda: False
+    return orch
+
+
+def _make_sub_orch() -> types.SimpleNamespace:
+    orch = types.SimpleNamespace()
+    orch._pending_subagent_results = {}
+    orch._injected_subagents = {}
+    orch.agents = {"A": types.SimpleNamespace()}
+    return orch
+
+
+# --------------------------------------------------------------------------- #
+# R1 — peer-answer visibility race
+# --------------------------------------------------------------------------- #
+def test_r1_mark_seen_uses_captured_counts_not_live() -> None:
+    """The agent must be marked seen up to what it was SHOWN, not the live count.
+
+    Reproduces the race window: capture counts at selection, a peer appends a
+    new revision (the await window), then we register the injection.
+    """
+    orch = _make_peer_orch(["A", "B"])
+    tracker = PeerAnswerVisibilityTracker(orch)
+
+    # B has 2 revisions; A is shown content as of count==2.
+    orch.coordination_tracker.answers_by_agent["B"] = [
+        _Revision("v1", 1.0),
+        _Revision("v2", 2.0),
+    ]
+    orch.agent_states["B"].answer = "v2"
+    captured = {"B": tracker.get_agent_answer_revision_count("B")}
+    assert captured["B"] == 2
+
+    # --- the await window: B publishes revision 3 concurrently ---
+    orch.coordination_tracker.answers_by_agent["B"].append(_Revision("v3", 3.0))
+    orch.agent_states["B"].answer = "v3"
+
+    # Register the injection using the counts captured at selection time.
+    tracker.register_injected_answer_updates("A", ["B"], seen_counts=captured)
+
+    # A was only shown rev2 -> must be marked seen==2, NOT the live 3.
+    assert orch.agent_states["A"].seen_answer_counts["B"] == 2
+    # rev3 remains unseen, so it can still be injected / trigger a restart.
+    assert tracker.has_unseen_answer_updates("A") is True
+
+
+def test_r1_without_captured_counts_falls_back_to_live() -> None:
+    """Backward-compat: omitting seen_counts preserves the legacy live-read behavior.
+
+    Existing callers/tests that call register/mark without seen_counts must keep
+    working unchanged.
+    """
+    orch = _make_peer_orch(["A", "B"])
+    tracker = PeerAnswerVisibilityTracker(orch)
+    orch.coordination_tracker.answers_by_agent["B"] = [
+        _Revision("v1", 1.0),
+        _Revision("v2", 2.0),
+        _Revision("v3", 3.0),
+    ]
+    orch.agent_states["B"].answer = "v3"
+
+    tracker.register_injected_answer_updates("A", ["B"])  # no seen_counts
+
+    assert orch.agent_states["A"].seen_answer_counts["B"] == 3
+    assert tracker.has_unseen_answer_updates("A") is False
+
+
+def test_r1_mark_seen_never_regresses_existing_count() -> None:
+    """A captured count must never lower an already-higher seen count."""
+    orch = _make_peer_orch(["A", "B"])
+    tracker = PeerAnswerVisibilityTracker(orch)
+    orch.coordination_tracker.answers_by_agent["B"] = [
+        _Revision("v1", 1.0),
+        _Revision("v2", 2.0),
+        _Revision("v3", 3.0),
+    ]
+    orch.agent_states["A"].seen_answer_counts["B"] = 5  # already ahead
+
+    tracker.mark_seen_answer_revisions("A", ["B"], seen_counts={"B": 2})
+
+    assert orch.agent_states["A"].seen_answer_counts["B"] == 5
+
+
+def test_r1_captured_count_clamped_to_current() -> None:
+    """A captured count larger than the live list is clamped (defensive)."""
+    orch = _make_peer_orch(["A", "B"])
+    tracker = PeerAnswerVisibilityTracker(orch)
+    orch.coordination_tracker.answers_by_agent["B"] = [_Revision("v1", 1.0)]
+
+    tracker.mark_seen_answer_revisions("A", ["B"], seen_counts={"B": 99})
+
+    assert orch.agent_states["A"].seen_answer_counts["B"] == 1
+
+
+# --------------------------------------------------------------------------- #
+# R2 / R3 — background-subagent result consume race
+# --------------------------------------------------------------------------- #
+def test_r2_consume_removes_only_consumed_ids() -> None:
+    orch = _make_sub_orch()
+    orch._pending_subagent_results["A"] = [("s1", "r1"), ("s2", "r2")]
+    coord = SubagentLifecycleCoordinator(orch)
+
+    coord.consume_pending_subagent_results("A", {"s1"})
+
+    assert orch._pending_subagent_results["A"] == [("s2", "r2")]
+
+
+def test_r2_consume_drops_key_when_fully_consumed() -> None:
+    orch = _make_sub_orch()
+    orch._pending_subagent_results["A"] = [("s1", "r1"), ("s2", "r2")]
+    coord = SubagentLifecycleCoordinator(orch)
+
+    coord.consume_pending_subagent_results("A", {"s1", "s2"})
+
+    assert "A" not in orch._pending_subagent_results
+
+
+def test_r2_consume_missing_key_is_noop() -> None:
+    orch = _make_sub_orch()
+    coord = SubagentLifecycleCoordinator(orch)
+    coord.consume_pending_subagent_results("A", {"s1"})  # must not raise
+    assert orch._pending_subagent_results == {}
+
+
+def test_r2_consume_preserves_concurrent_append() -> None:
+    """Simulation of the race window with a manual snapshot/consume."""
+    orch = _make_sub_orch()
+    orch._pending_subagent_results["A"] = [("s1", "r1")]
+    coord = SubagentLifecycleCoordinator(orch)
+
+    collected_ids = {sid for sid, _ in orch._pending_subagent_results["A"]}
+    # --- await window: a background task appends a fresh result ---
+    orch._pending_subagent_results["A"].append(("s2", "r2"))
+
+    coord.consume_pending_subagent_results("A", collected_ids)
+
+    assert orch._pending_subagent_results["A"] == [("s2", "r2")]
+
+
+@pytest.mark.asyncio
+async def test_r2_collect_then_consume_preserves_late_append(monkeypatch) -> None:
+    """End-to-end simulation through the REAL collect + consume code path.
+
+    Drives ``collect_pending_subagent_results_async`` (the real consumer read),
+    appends a fresh result during the simulated await window, then consumes only
+    the collected ids -- proving the late append survives.
+    """
+    orch = _make_sub_orch()
+    orch._pending_subagent_results["A"] = [("s1", "r1")]
+    coord = SubagentLifecycleCoordinator(orch)
+
+    async def _no_mcp_poll(agent_id: str):
+        return []
+
+    monkeypatch.setattr(coord, "get_pending_subagent_results_async", _no_mcp_poll)
+
+    collected = await coord.collect_pending_subagent_results_async("A")
+    assert [sid for sid, _ in collected] == ["s1"]
+
+    # --- await window: background trace-analyzer task appends a result ---
+    orch._pending_subagent_results["A"].append(("s2", "r2"))
+
+    coord.consume_pending_subagent_results("A", {sid for sid, _ in collected})
+
+    assert [sid for sid, _ in orch._pending_subagent_results["A"]] == ["s2"]
+
+
+# --------------------------------------------------------------------------- #
+# R4 — detached trace-analyzer tasks must be cancelled by timeout cleanup
+# --------------------------------------------------------------------------- #
+def _make_cleanup_orch() -> types.SimpleNamespace:
+    orch = types.SimpleNamespace()
+    orch._subagent_launch_watcher = None
+    orch._flush_pending_subagent_results = lambda: None
+    orch._active_tasks = {}
+    orch.agents = {}
+    orch._active_streams = {}
+    orch.is_orchestrator_timeout = True
+    orch.coordination_tracker = types.SimpleNamespace(track_agent_action=lambda *a, **k: None)
+    orch._background_trace_tasks = {}
+    return orch
+
+
+@pytest.mark.asyncio
+async def test_r4_cleanup_cancels_background_trace_tasks() -> None:
+    """Timeout cleanup must cancel detached trace tasks, not leak them past timeout."""
+    from massgen.orchestrator_collaborators.active_coordination_cleanup import (
+        ActiveCoordinationCleanup,
+    )
+
+    orch = _make_cleanup_orch()
+
+    async def _never_finishes():
+        await asyncio.sleep(60)
+
+    task = asyncio.create_task(_never_finishes())
+    await asyncio.sleep(0)  # let the task start
+    orch._background_trace_tasks = {"A": task}
+
+    await ActiveCoordinationCleanup(orch).cleanup()
+
+    assert task.cancelled() or task.done()
+    assert orch._background_trace_tasks == {}
+
+
+@pytest.mark.asyncio
+async def test_r4_cleanup_skips_already_done_trace_tasks() -> None:
+    from massgen.orchestrator_collaborators.active_coordination_cleanup import (
+        ActiveCoordinationCleanup,
+    )
+
+    orch = _make_cleanup_orch()
+
+    async def _quick():
+        return None
+
+    task = asyncio.create_task(_quick())
+    await task  # already done
+    orch._background_trace_tasks = {"A": task}
+
+    await ActiveCoordinationCleanup(orch).cleanup()  # must not raise
+
+    assert orch._background_trace_tasks == {}
+
+
+# --------------------------------------------------------------------------- #
+# R5 — cancel_all_subagents must await cancellations before clearing the registry
+# --------------------------------------------------------------------------- #
+@pytest.mark.asyncio
+async def test_r5_cancel_all_awaits_background_tasks_before_clear() -> None:
+    """Cancelled background tasks run their CancelledError handler before return.
+
+    Without awaiting, the method clears the registry and returns while the
+    cancelled coroutine has not yet resumed -- its cleanup runs later against a
+    cleared dict. The fix gathers the cancellations first.
+    """
+    from massgen.subagent.manager import SubagentManager
+
+    mgr = SubagentManager.__new__(SubagentManager)
+    mgr._active_processes = {}
+    mgr._subagents = {}
+
+    handler_ran = {"value": False}
+
+    async def _bg():
+        try:
+            await asyncio.sleep(60)
+        except asyncio.CancelledError:
+            handler_ran["value"] = True
+            raise
+
+    task = asyncio.create_task(_bg())
+    await asyncio.sleep(0)  # let it start
+    mgr._background_tasks = {"s1": task}
+
+    count = await mgr.cancel_all_subagents()
+
+    assert handler_ran["value"] is True, "cancellation was not awaited before return"
+    assert task.done()
+    assert mgr._background_tasks == {}
+    assert count >= 1
+
+
+# --------------------------------------------------------------------------- #
+# D2 — per-round worktree isolation failure is recorded, not silently swallowed
+# --------------------------------------------------------------------------- #
+def test_d2_default_agent_state_not_degraded() -> None:
+    state = AgentState()
+    assert state.round_isolation_degraded is False
+    assert state.round_isolation_error is None
+
+
+def test_d2_records_round_isolation_degradation(mock_orchestrator) -> None:
+    orch = mock_orchestrator(num_agents=2)
+
+    orch._record_round_isolation_degraded("agent_a", RuntimeError("git worktree add failed"))
+
+    degraded = orch.agent_states["agent_a"]
+    assert degraded.round_isolation_degraded is True
+    assert "RuntimeError" in degraded.round_isolation_error
+    assert "git worktree add failed" in degraded.round_isolation_error
+    # untouched agents stay clean
+    assert orch.agent_states["agent_b"].round_isolation_degraded is False
+
+
+def test_d2_record_is_safe_for_unknown_agent(mock_orchestrator) -> None:
+    orch = mock_orchestrator(num_agents=1)
+    # Must not raise even if the agent id is not in agent_states.
+    orch._record_round_isolation_degraded("nonexistent", ValueError("boom"))
+
+
+# --------------------------------------------------------------------------- #
+# D3 — post-record changedoc enrichment must not kill a valid-answer agent
+# --------------------------------------------------------------------------- #
+class _Answer:
+    def __init__(self, label: str) -> None:
+        self.label = label
+        self.changedoc = None
+
+
+def test_d3_changedoc_attached_happy_path(mock_orchestrator, monkeypatch) -> None:
+    orch = mock_orchestrator(num_agents=1)
+    agent = orch.agents["agent_a"]
+    agent.backend.filesystem_manager = types.SimpleNamespace(cwd="/tmp/ws")
+    monkeypatch.setattr(orch, "_is_changedoc_enabled", lambda: True)
+    ans = _Answer("agent1.1")
+    orch.coordination_tracker.answers_by_agent["agent_a"] = [ans]
+
+    import massgen.changedoc as cd
+
+    monkeypatch.setattr(cd, "read_changedoc_from_workspace", lambda p: "changes by [SELF]")
+
+    orch._attach_changedoc_to_latest_answer("agent_a", agent)
+
+    assert ans.changedoc == "changes by agent1.1"
+
+
+def test_d3_changedoc_failure_does_not_raise_and_preserves_answer(mock_orchestrator, monkeypatch) -> None:
+    orch = mock_orchestrator(num_agents=1)
+    agent = orch.agents["agent_a"]
+    agent.backend.filesystem_manager = types.SimpleNamespace(cwd="/tmp/ws")
+    monkeypatch.setattr(orch, "_is_changedoc_enabled", lambda: True)
+    ans = _Answer("agent1.1")
+    orch.coordination_tracker.answers_by_agent["agent_a"] = [ans]
+
+    import massgen.changedoc as cd
+
+    def _boom(_path):
+        raise OSError("disk gone")
+
+    monkeypatch.setattr(cd, "read_changedoc_from_workspace", _boom)
+
+    # Must not raise — the agent already submitted a valid answer.
+    orch._attach_changedoc_to_latest_answer("agent_a", agent)
+
+    assert ans.changedoc is None
+    assert orch.agent_states["agent_a"].is_killed is False
diff --git a/massgen/tests/test_grok_backend.py b/massgen/tests/test_grok_backend.py
index c6c39592c..a88dc5b8d 100644
--- a/massgen/tests/test_grok_backend.py
+++ b/massgen/tests/test_grok_backend.py
@@ -1,156 +1,86 @@
 #!/usr/bin/env python3
+"""Tests for the Grok backend.
+
+E2 fix: this file previously held three async functions that ``print()``-ed and
+returned ``True/False`` with no assertions. Under ``asyncio_mode=auto`` they were
+collected and passed silently (the return value was consumed by pytest-asyncio),
+giving keyless duplicate coverage with no regression-detection value.
+
+Now:
+  - the metadata/token/cost surface is a real **offline** unit test (no key, no
+    network, no cost), so it runs in the default suite;
+  - the streaming and agent paths are proper ``live_api`` tests that skip without
+    ``XAI_API_KEY`` and assert on real responses (they do NOT run by default, so
+    no external cost is incurred unless explicitly enabled).
 """
-Test script for Grok backend integration with architecture.
-Tests basic functionality, tool integration, and streaming.
-"""
-
-import asyncio
-import os
-import sys
-from pathlib import Path
 
-# Add project root to path
-project_root = Path(__file__).parent.parent.parent.parent
-sys.path.insert(0, str(project_root))
+from __future__ import annotations
 
-from massgen.backend.grok import GrokBackend  # noqa: E402
-from massgen.chat_agent import SingleAgent  # noqa: E402
-
-
-async def test_grok_basic():
-    """Test basic Grok backend functionality."""
-    print("🧪 Testing Grok Backend - Basic Functionality")
+import os
 
-    # Check if API key is available
-    api_key = os.getenv("XAI_API_KEY")
-    if not api_key:
-        print("❌ XAI_API_KEY not found in environment variables")
-        print("⚠️  Set XAI_API_KEY to test Grok backend")
-        return False
+import pytest
 
-    try:
-        backend = GrokBackend(api_key=api_key)
+from massgen.backend.grok import GrokBackend
+from massgen.chat_agent import SingleAgent
 
-        # Test basic info
-        print(f"✅ Provider: {backend.get_provider_name()}")
-        print(f"✅ Supported tools: {backend.get_supported_builtin_tools()}")
 
-        # Test token estimation
-        test_text = "Hello world, this is a test message"
-        tokens = backend.estimate_tokens(test_text)
-        print(f"✅ Token estimation: {tokens} tokens for '{test_text}'")
+def test_grok_backend_metadata_offline() -> None:
+    """Provider name, builtin tools, token estimation, and cost calc (no network)."""
+    backend = GrokBackend(api_key="dummy-key-offline")
 
-        # Test cost calculation
-        cost = backend.calculate_cost(100, 50, "grok-3-mini")
-        print(f"✅ Cost calculation: ${cost:.6f} for 100 input + 50 output tokens")
+    assert backend.get_provider_name() == "Grok"
 
-        return True
+    tools = backend.get_supported_builtin_tools()
+    assert isinstance(tools, list) and tools, "expected at least one builtin tool"
 
-    except Exception as e:
-        print(f"❌ Basic test failed: {e}")
-        return False
+    tokens = backend.estimate_tokens("Hello world, this is a test message")
+    assert isinstance(tokens, int) and tokens > 0
 
+    cost = backend.calculate_cost(100, 50, "grok-3-mini")
+    assert isinstance(cost, float) and cost > 0.0
 
-async def test_grok_streaming():
-    """Test Grok streaming without tools."""
-    print("\n🧪 Testing Grok Backend - Streaming")
 
+@pytest.mark.live_api
+@pytest.mark.asyncio
+async def test_grok_streaming_live() -> None:
+    """Streaming returns non-empty content with no error chunk (requires XAI_API_KEY)."""
     api_key = os.getenv("XAI_API_KEY")
     if not api_key:
-        print("❌ XAI_API_KEY not found - skipping streaming test")
-        return False
-
-    try:
-        backend = GrokBackend(api_key=api_key)
-
-        messages = [
-            {
-                "role": "user",
-                "content": "Say hello and explain what you are in one sentence.",
-            },
-        ]
-
-        print("📤 Sending request to Grok...")
-        response_content = ""
-
-        async for chunk in backend.stream_with_tools(messages, tools=[], model="grok-3-mini"):
-            if chunk.type == "content" and chunk.content:
-                response_content += chunk.content
-                print(chunk.content, end="", flush=True)
-            elif chunk.type == "error":
-                print(f"\n❌ Error: {chunk.error}")
-                return False
+        pytest.skip("XAI_API_KEY not set")
 
-        print(f"\n✅ Streaming test completed. Response length: {len(response_content)} chars")
-        return True
+    backend = GrokBackend(api_key=api_key)
+    messages = [{"role": "user", "content": "Say hello in one short sentence."}]
 
-    except Exception as e:
-        print(f"❌ Streaming test failed: {e}")
-        return False
+    response_content = ""
+    async for chunk in backend.stream_with_tools(messages, tools=[], model="grok-3-mini"):
+        if chunk.type == "content" and chunk.content:
+            response_content += chunk.content
+        elif chunk.type == "error":
+            pytest.fail(f"Grok streaming returned an error chunk: {chunk.error}")
 
+    assert response_content.strip()
 
-async def test_grok_with_agent():
-    """Test Grok backend through SingleAgent integration."""
-    print("\n🧪 Testing Grok Backend - SingleAgent Integration")
 
+@pytest.mark.live_api
+@pytest.mark.asyncio
+async def test_grok_with_agent_live() -> None:
+    """SingleAgent over Grok returns content with no error chunk (requires XAI_API_KEY)."""
     api_key = os.getenv("XAI_API_KEY")
     if not api_key:
-        print("❌ XAI_API_KEY not found - skipping agent test")
-        return False
-
-    try:
-        # Create Grok backend and agent
-        backend = GrokBackend(api_key=api_key)
-        agent = SingleAgent(
-            backend=backend,
-            system_message="You are a helpful AI assistant.",
-            agent_id="test_grok_agent",
-        )
-
-        print("📤 Testing agent response...")
-        response_content = ""
-
-        # Test agent with a simple message
-        messages = [{"role": "user", "content": "What is 2+2? Answer briefly."}]
-        async for chunk in agent.chat(messages):
-            if chunk.type == "content" and chunk.content:
-                response_content += chunk.content
-                print(chunk.content, end="", flush=True)
-            elif chunk.type == "error":
-                print(f"\n❌ Agent error: {chunk.error}")
-                return False
-
-        print(f"\n✅ Agent test completed. Response: '{response_content.strip()}'")
-        return True
-
-    except Exception as e:
-        print(f"❌ Agent test failed: {e}")
-        return False
-
-
-async def main():
-    """Run all Grok backend tests."""
-    print("🚀 MassGen - Grok Backend Testing")
-    print("=" * 50)
-
-    results = []
-
-    # Run tests
-    results.append(await test_grok_basic())
-    results.append(await test_grok_streaming())
-    results.append(await test_grok_with_agent())
-
-    # Summary
-    print("\n" + "=" * 50)
-    print("📊 Test Results:")
-    print(f"✅ Passed: {sum(results)}")
-    print(f"❌ Failed: {len(results) - sum(results)}")
-
-    if all(results):
-        print("🎉 All Grok backend tests passed!")
-    else:
-        print("⚠️  Some tests failed - check XAI_API_KEY and network connection")
-
-
-if __name__ == "__main__":
-    asyncio.run(main())
+        pytest.skip("XAI_API_KEY not set")
+
+    backend = GrokBackend(api_key=api_key)
+    agent = SingleAgent(
+        backend=backend,
+        system_message="You are a helpful AI assistant.",
+        agent_id="test_grok_agent",
+    )
+
+    response_content = ""
+    async for chunk in agent.chat([{"role": "user", "content": "What is 2+2? Answer briefly."}]):
+        if chunk.type == "content" and chunk.content:
+            response_content += chunk.content
+        elif chunk.type == "error":
+            pytest.fail(f"Grok agent returned an error chunk: {chunk.error}")
+
+    assert response_content.strip()
diff --git a/massgen/tests/test_message_context_building.py b/massgen/tests/test_message_context_building.py
index b37df9017..f2288afea 100644
--- a/massgen/tests/test_message_context_building.py
+++ b/massgen/tests/test_message_context_building.py
@@ -1,274 +1,125 @@
 #!/usr/bin/env python3
-"""
-Test script to examine how conversation context is built for LLM input.
-Shows the exact message templates and context structure without making API calls.
-"""
-
-import sys
-from pathlib import Path
-from typing import Any
-
-# Add project root to path
-project_root = Path(__file__).parent.parent.parent.parent
-sys.path.insert(0, str(project_root))
+"""Tests for conversation-context construction in MessageTemplates.
 
-from massgen.message_templates import MessageTemplates  # noqa: E402
+E1 fix: this file previously contained four collected ``test_*`` functions that
+only ``print()`` computed booleans and returned ``None`` -- pytest passed them
+unconditionally, giving false coverage on a core path. They are now real
+assertions, with expectations verified against the actual rendered output:
 
+  - the raw agent-id key is NOT echoed into the message (summaries are
+    relabeled), so we assert on the summary *content* instead;
+  - the ``User:`` history-label count equals the number of prior user turns;
+  - the conversation-history section appears only when history is non-empty.
 
-def print_message_structure(title: str, conversation: dict[str, Any]):
-    """Print the structure of a conversation message in a readable format."""
-    print(f"\n{'='*80}")
-    print(f"🔍 {title}")
-    print(f"{'='*80}")
-
-    # System message
-    print("📋 SYSTEM MESSAGE:")
-    print("-" * 40)
-    system_msg = conversation["system_message"]
-    print(system_msg)
-
-    # User message
-    print("\n📨 USER MESSAGE:")
-    print("-" * 40)
-    user_msg = conversation["user_message"]
-    print(user_msg)
-
-    # Tools
-    print("\n🔧 TOOLS PROVIDED:")
-    print("-" * 40)
-    tools = conversation.get("tools", [])
-    for i, tool in enumerate(tools, 1):
-        tool_name = tool.get("function", {}).get("name", "unknown")
-        tool_desc = tool.get("function", {}).get("description", "No description")
-        print(f"{i}. {tool_name}: {tool_desc}")
-
-    print("\n📊 STATISTICS:")
-    print(f"   System message length: {len(system_msg)} chars")
-    print(f"   User message length: {len(user_msg)} chars")
-    print(f"   Tools provided: {len(tools)}")
-    print(f"   Total context size: {len(system_msg) + len(user_msg)} chars")
+No API calls -- exercises MessageTemplates.build_conversation_with_context only.
+"""
 
+from __future__ import annotations
 
-def test_turn1_context():
-    """Test context building for the first turn (no history)."""
-    print("🔷 TURN 1 CONTEXT BUILDING")
-    print("Scenario: User asks initial question, no conversation history")
+from massgen.message_templates import MessageTemplates
 
-    templates = MessageTemplates()
 
-    # Build conversation for first turn
-    conversation = templates.build_conversation_with_context(
+def test_turn1_context_no_history() -> None:
+    """First turn: no history section, empty-answers section present."""
+    conversation = MessageTemplates().build_conversation_with_context(
         current_task="What are the main benefits of renewable energy?",
-        conversation_history=[],  # No history on first turn
-        agent_summaries=None,  # No agent answers yet
+        conversation_history=[],
+        agent_summaries=None,
         valid_agent_ids=None,
     )
-
-    print_message_structure("Turn 1: Initial Question", conversation)
-
-    # Verify structure
     user_msg = conversation["user_message"]
-    has_history = "CONVERSATION_HISTORY" in user_msg
-    has_original = "ORIGINAL MESSAGE" in user_msg
-    has_answers = "CURRENT ANSWERS" in user_msg and "no answers available yet" in user_msg
-
-    print("\n✅ VALIDATION:")
-    print(f"   Contains conversation history section: {has_history}")
-    print(f"   Contains original message section: {has_original}")
-    print(f"   Contains empty answers section: {has_answers}")
-    print(f"   System message mentions context: {'conversation' in conversation['system_message'].lower()}")
 
+    assert "CONVERSATION_HISTORY" not in user_msg
+    assert "ORIGINAL MESSAGE" in user_msg
+    assert "What are the main benefits of renewable energy?" in user_msg
+    assert "CURRENT ANSWERS" in user_msg
+    assert "no answers available yet" in user_msg
 
-def test_turn2_context():
-    """Test context building for the second turn (with history)."""
-    print("\n🔷 TURN 2 CONTEXT BUILDING")
-    print("Scenario: User asks follow-up, with previous exchange in history")
 
-    templates = MessageTemplates()
-
-    # Simulate conversation history from Turn 1
+def test_turn2_context_with_history_and_answers() -> None:
+    """Second turn: history section present, prior turn rendered, summary content shown."""
     conversation_history = [
         {"role": "user", "content": "What are the main benefits of renewable energy?"},
-        {
-            "role": "assistant",
-            "content": (
-                "Renewable energy offers several key benefits including environmental "
-                "sustainability, economic advantages, and energy security. It reduces "
-                "greenhouse gas emissions, creates jobs, and decreases dependence on fossil fuel imports."
-            ),
-        },
+        {"role": "assistant", "content": "Renewable energy reduces emissions and creates jobs."},
     ]
-
-    # Build conversation for second turn with history
-    conversation = templates.build_conversation_with_context(
+    conversation = MessageTemplates().build_conversation_with_context(
         current_task="What about the challenges and limitations?",
         conversation_history=conversation_history,
-        agent_summaries={"researcher": "Key benefits include environmental and economic advantages."},
+        agent_summaries={"researcher": "RESEARCHER_SUMMARY: environmental and economic advantages."},
         valid_agent_ids=["researcher"],
     )
-
-    print_message_structure("Turn 2: Follow-up with History", conversation)
-
-    # Verify structure
     user_msg = conversation["user_message"]
-    has_history = "CONVERSATION_HISTORY" in user_msg and "User: What are the main benefits" in user_msg
-    has_original = "ORIGINAL MESSAGE" in user_msg and "challenges and limitations" in user_msg
-    has_answers = "CURRENT ANSWERS" in user_msg and "researcher" in user_msg
 
-    print("\n✅ VALIDATION:")
-    print(f"   Contains conversation history: {has_history}")
-    print(f"   Contains current question: {has_original}")
-    print(f"   Contains agent answers: {has_answers}")
-    print(f"   System message is context-aware: {'conversation' in conversation['system_message'].lower()}")
+    assert "CONVERSATION_HISTORY" in user_msg
+    assert "What are the main benefits" in user_msg  # prior user turn rendered
+    assert "ORIGINAL MESSAGE" in user_msg
+    assert "challenges and limitations" in user_msg  # current task
+    assert "CURRENT ANSWERS" in user_msg
+    # Summary *content* is rendered (the raw agent id is relabeled, not echoed).
+    assert "RESEARCHER_SUMMARY" in user_msg
+    assert user_msg.count("User:") == 1  # exactly one prior user turn
 
 
-def test_turn3_context():
-    """Test context building for the third turn (extended history)."""
-    print("\n🔷 TURN 3 CONTEXT BUILDING")
-    print("Scenario: User asks third question, with extended conversation history")
-
-    templates = MessageTemplates()
-
-    # Simulate extended conversation history
+def test_turn3_context_extended_history() -> None:
+    """Third turn: two prior user turns and two distinct agent summaries."""
     conversation_history = [
         {"role": "user", "content": "What are the main benefits of renewable energy?"},
-        {
-            "role": "assistant",
-            "content": "Renewable energy offers environmental, economic, and energy security benefits.",
-        },
+        {"role": "assistant", "content": "Environmental, economic, and energy-security benefits."},
         {"role": "user", "content": "What about the challenges and limitations?"},
-        {
-            "role": "assistant",
-            "content": "Main challenges include high upfront costs, intermittency issues, and infrastructure requirements.",
-        },
+        {"role": "assistant", "content": "High upfront costs, intermittency, infrastructure."},
     ]
-
-    # Build conversation for third turn with extended history
-    conversation = templates.build_conversation_with_context(
+    conversation = MessageTemplates().build_conversation_with_context(
         current_task="How can governments support the transition?",
         conversation_history=conversation_history,
         agent_summaries={
-            "researcher": "Benefits include environmental and economic advantages.",
-            "analyst": "Challenges include costs, intermittency, and infrastructure needs.",
+            "researcher": "RESEARCHER_SUMMARY: benefits.",
+            "analyst": "ANALYST_SUMMARY: challenges.",
         },
         valid_agent_ids=["researcher", "analyst"],
     )
-
-    print_message_structure("Turn 3: Extended Conversation", conversation)
-
-    # Verify structure
     user_msg = conversation["user_message"]
-    has_full_history = "CONVERSATION_HISTORY" in user_msg and user_msg.count("User:") >= 2
-    has_original = "ORIGINAL MESSAGE" in user_msg and "governments support" in user_msg
-    has_multiple_answers = "CURRENT ANSWERS" in user_msg and "researcher" in user_msg and "analyst" in user_msg
-
-    print("\n✅ VALIDATION:")
-    print(f"   Contains full conversation history: {has_full_history}")
-    print(f"   Contains current question: {has_original}")
-    print(f"   Contains multiple agent answers: {has_multiple_answers}")
-    print(f"   History shows progression: {user_msg.count('User:') >= 2}")
 
+    assert "CONVERSATION_HISTORY" in user_msg
+    assert "ORIGINAL MESSAGE" in user_msg
+    assert "governments support" in user_msg
+    assert "RESEARCHER_SUMMARY" in user_msg
+    assert "ANALYST_SUMMARY" in user_msg
+    assert user_msg.count("User:") == 2  # both prior user turns rendered
 
-def test_context_comparison():
-    """Compare context building across different turns."""
-    print("\n🔍 CONTEXT COMPARISON ACROSS TURNS")
-    print("=" * 80)
 
+def test_context_grows_with_history() -> None:
+    """User-message size grows monotonically as history accumulates."""
     templates = MessageTemplates()
 
-    # Turn 1: No history
     conv1 = templates.build_conversation_with_context(
         current_task="What is solar energy?",
         conversation_history=[],
         agent_summaries=None,
     )
-
-    # Turn 2: With history
     history = [
         {"role": "user", "content": "What is solar energy?"},
-        {
-            "role": "assistant",
-            "content": "Solar energy is power derived from sunlight.",
-        },
+        {"role": "assistant", "content": "Solar energy is power derived from sunlight."},
     ]
     conv2 = templates.build_conversation_with_context(
         current_task="How efficient is it?",
         conversation_history=history,
-        agent_summaries={"expert": "Solar energy harnesses sunlight for power generation."},
+        agent_summaries={"expert": "Solar harnesses sunlight."},
     )
-
-    # Turn 3: Extended history
-    extended_history = [
-        {"role": "user", "content": "What is solar energy?"},
-        {
-            "role": "assistant",
-            "content": "Solar energy is power derived from sunlight.",
-        },
+    extended = history + [
         {"role": "user", "content": "How efficient is it?"},
-        {
-            "role": "assistant",
-            "content": "Modern solar panels achieve 15-22% efficiency.",
-        },
+        {"role": "assistant", "content": "Modern panels achieve 15-22% efficiency."},
     ]
     conv3 = templates.build_conversation_with_context(
         current_task="What are the costs?",
-        conversation_history=extended_history,
-        agent_summaries={
-            "expert": "Solar energy harnesses sunlight for power generation.",
-            "engineer": "Modern panels achieve 15-22% efficiency.",
-        },
+        conversation_history=extended,
+        agent_summaries={"expert": "Solar harnesses sunlight.", "engineer": "15-22% efficiency."},
     )
 
-    print("📊 CONTEXT SIZE PROGRESSION:")
-    print(f"   Turn 1 (no history):     {len(conv1['user_message']):,} chars")
-    print(f"   Turn 2 (with history):   {len(conv2['user_message']):,} chars")
-    print(f"   Turn 3 (extended):       {len(conv3['user_message']):,} chars")
-
-    print("\n📈 CONTEXT ELEMENTS:")
-    elements = ["CONVERSATION_HISTORY", "ORIGINAL MESSAGE", "CURRENT ANSWERS"]
-
-    for i, (conv, turn) in enumerate([(conv1, "Turn 1"), (conv2, "Turn 2"), (conv3, "Turn 3")], 1):
-        user_msg = conv["user_message"]
-        print(f"\n   {turn}:")
-        for element in elements:
-            present = element in user_msg
-            print(f"     {element}: {'✅' if present else '❌'}")
-
-        # Count conversation exchanges
-        if "CONVERSATION_HISTORY" in user_msg:
-            exchange_count = user_msg.count("User:")
-            print(f"     Previous exchanges: {exchange_count}")
-
-
-def main():
-    """Run all context building tests."""
-    print("🚀 MassGen - Message Context Building Analysis")
-    print("=" * 80)
-    print("This test examines how conversation context is structured")
-    print("for LLM input across multiple conversation turns.")
-    print()
-
-    try:
-        # Test each turn's context building
-        test_turn1_context()
-        test_turn2_context()
-        test_turn3_context()
-        test_context_comparison()
-
-        print("\n🎉 ALL CONTEXT BUILDING TESTS COMPLETED")
-        print("=" * 80)
-        print("✅ Message templates properly build conversation context")
-        print("✅ Context grows appropriately with conversation history")
-        print("✅ All required sections are included in each turn")
-        print("🔍 Review the detailed context structures above to understand")
-        print("   exactly what information is provided to agents at each turn.")
-
-    except Exception as e:
-        print(f"❌ Context building test failed: {e}")
-        import traceback
-
-        traceback.print_exc()
-
+    size1 = len(conv1["user_message"])
+    size2 = len(conv2["user_message"])
+    size3 = len(conv3["user_message"])
+    assert size1 < size2 < size3
 
-if __name__ == "__main__":
-    main()
+    assert "CONVERSATION_HISTORY" not in conv1["user_message"]
+    assert "CONVERSATION_HISTORY" in conv2["user_message"]
+    assert "CONVERSATION_HISTORY" in conv3["user_message"]
diff --git a/massgen/tests/test_midstream_injection_unified.py b/massgen/tests/test_midstream_injection_unified.py
new file mode 100644
index 000000000..9de6b956b
--- /dev/null
+++ b/massgen/tests/test_midstream_injection_unified.py
@@ -0,0 +1,165 @@
+"""A1 regression: the two near-identical mid-stream injection closures
+(GeneralHookManager path + native path) are unified into a single
+``MidStreamInjectionHookInstaller.build_midstream_injection(..., native=)``.
+
+This closes the backend-parity hazard where a fix to one closure silently
+skipped the other. These tests drive the REAL unified collaborator method
+against a recording fake orchestrator (no LLM calls) and assert:
+
+  - the load-bearing invariant: ``update_agent_context_with_new_answers`` runs
+    BEFORE ``refresh_checklist_state_for_agent`` (so available_agent_labels
+    reflect newly-injected labels) -- for BOTH native and non-native paths;
+  - the captured revision counts (R1) are threaded into register;
+  - state mutation (injection_count, midstream_injections_this_round,
+    known_answer_ids, answers dict) is identical across paths;
+  - the early-exit guards (disable_injection, vote-only, cap-reached) behave.
+"""
+
+from __future__ import annotations
+
+import types
+
+import pytest
+
+from massgen.orchestrator import AgentState
+from massgen.orchestrator_collaborators.midstream_injection_hook_installer import (
+    MidStreamInjectionHookInstaller,
+)
+
+
+class _RecordingOrch:
+    """Fake orchestrator capturing the order + arguments of injection side effects."""
+
+    def __init__(self, *, selected=None, has_unseen_after=False, vote_only=False, disable=False, cap_reached=False):
+        self.calls: list[tuple] = []
+        self._selected = {} if selected is None else dict(selected)
+        self._had_unseen = bool(selected) or cap_reached
+        self._cap_reached = cap_reached
+        self._has_unseen_after = has_unseen_after
+        self._vote_only = vote_only
+
+        self.config = types.SimpleNamespace(disable_injection=disable, max_midstream_injections_per_round=2)
+        self.agent_states = {"A": AgentState(restart_pending=True)}
+        self.agents = {"A": types.SimpleNamespace(backend=types.SimpleNamespace(filesystem_manager=None))}
+        self.coordination_tracker = types.SimpleNamespace(
+            get_reverse_agent_mapping=lambda: {},
+            update_agent_context_with_new_answers=lambda aid, srcs: self.calls.append(("update_context", aid, tuple(srcs))),
+            track_agent_action=lambda aid, action, msg: self.calls.append(("track", aid, msg)),
+        )
+
+    # --- guard predicates ---
+    def _check_restart_pending(self, agent_id):
+        return self.agent_states[agent_id].restart_pending
+
+    def _should_defer_restart_for_first_answer(self, agent_id):
+        return False
+
+    def _is_vote_only_mode(self, agent_id):
+        return self._vote_only
+
+    def _should_defer_peer_updates_until_restart(self, agent_id):
+        return False
+
+    def _has_unseen_answer_updates(self, agent_id):
+        return self._has_unseen_after
+
+    def _should_skip_injection_due_to_timeout(self, agent_id):
+        return False
+
+    # --- selection ---
+    def _get_current_answers_snapshot(self):
+        return dict(self._selected) if self._selected else {}
+
+    def _select_midstream_answer_updates(self, agent_id, current_answers):
+        if self._cap_reached:
+            return ({}, True)
+        return (dict(self._selected), bool(self._selected))
+
+    def _capture_answer_revision_counts(self, source_ids):
+        return {sid: 7 for sid in source_ids}
+
+    async def _copy_all_snapshots_to_temp_workspace(self, agent_id):
+        self.calls.append(("copy_snapshots", agent_id))
+
+    def _build_tool_result_injection(self, agent_id, selected, existing_answers=None):
+        self.calls.append(("build_injection", agent_id, tuple(selected)))
+        return "INJECTION_PAYLOAD"
+
+    # --- post-record side effects ---
+    def _register_injected_answer_updates(self, agent_id, source_ids, seen_counts=None):
+        self.calls.append(("register", agent_id, tuple(source_ids), tuple(sorted((seen_counts or {}).items()))))
+
+    def _mark_pending_checklist_recheck_labels(self, agent_id, source_ids):
+        self.calls.append(("mark_pending", agent_id, tuple(source_ids)))
+
+    def _refresh_checklist_state_for_agent(self, agent_id):
+        self.calls.append(("refresh_checklist", agent_id))
+
+    def _order(self) -> list[str]:
+        return [c[0] for c in self.calls]
+
+
+@pytest.mark.asyncio
+@pytest.mark.parametrize("native", [False, True])
+async def test_a1_unified_injection_happy_path(native) -> None:
+    orch = _RecordingOrch(selected={"B": "peer answer"})
+    installer = MidStreamInjectionHookInstaller(orch)
+    answers: dict[str, str] = {}
+
+    result = await installer.build_midstream_injection("A", answers, native=native)
+
+    assert result == "INJECTION_PAYLOAD"
+    # answers dict mutated so the same content isn't re-injected
+    assert answers == {"B": "peer answer"}
+    state = orch.agent_states["A"]
+    assert state.injection_count == 1
+    assert state.midstream_injections_this_round == 1
+    assert "B" in state.known_answer_ids
+
+    order = orch._order()
+    # Load-bearing invariant (both paths): context update BEFORE checklist refresh.
+    assert order.index("update_context") < order.index("refresh_checklist")
+    # R1: captured counts threaded into register.
+    register_call = next(c for c in orch.calls if c[0] == "register")
+    assert register_call[3] == (("B", 7),)
+    # snapshot copy happens before building the injection
+    assert order.index("copy_snapshots") < order.index("build_injection")
+
+
+@pytest.mark.asyncio
+async def test_a1_native_and_general_have_same_effect_order() -> None:
+    """The unified method yields the identical side-effect sequence regardless of path."""
+    orch_general = _RecordingOrch(selected={"B": "x"})
+    orch_native = _RecordingOrch(selected={"B": "x"})
+    await MidStreamInjectionHookInstaller(orch_general).build_midstream_injection("A", {}, native=False)
+    await MidStreamInjectionHookInstaller(orch_native).build_midstream_injection("A", {}, native=True)
+
+    assert orch_general._order() == orch_native._order()
+
+
+@pytest.mark.asyncio
+@pytest.mark.parametrize("native", [False, True])
+async def test_a1_disable_injection_short_circuits(native) -> None:
+    orch = _RecordingOrch(selected={"B": "x"}, disable=True)
+    result = await MidStreamInjectionHookInstaller(orch).build_midstream_injection("A", {}, native=native)
+    assert result is None
+    assert orch.calls == []  # nothing happened
+
+
+@pytest.mark.asyncio
+@pytest.mark.parametrize("native", [False, True])
+async def test_a1_vote_only_mode_skips_injection(native) -> None:
+    orch = _RecordingOrch(selected={"B": "x"}, vote_only=True)
+    result = await MidStreamInjectionHookInstaller(orch).build_midstream_injection("A", {}, native=native)
+    assert result is None
+    assert "build_injection" not in orch._order()
+
+
+@pytest.mark.asyncio
+@pytest.mark.parametrize("native", [False, True])
+async def test_a1_cap_reached_keeps_restart_pending(native) -> None:
+    orch = _RecordingOrch(cap_reached=True)
+    result = await MidStreamInjectionHookInstaller(orch).build_midstream_injection("A", {}, native=native)
+    assert result is None
+    assert orch.agent_states["A"].restart_pending is True
+    assert "build_injection" not in orch._order()
diff --git a/massgen/tests/test_snapshot_copy_offload.py b/massgen/tests/test_snapshot_copy_offload.py
new file mode 100644
index 000000000..de9e55357
--- /dev/null
+++ b/massgen/tests/test_snapshot_copy_offload.py
@@ -0,0 +1,162 @@
+"""B1 regression tests: snapshot copy is offloaded off the event loop.
+
+The audit found ``FilesystemManager.copy_snapshots_to_temp_workspace`` was an
+``async def`` performing purely *synchronous* blocking I/O (rmtree/copytree/scrub)
+directly on the asyncio event loop. While one agent's snapshots were copied, no
+other agent's stream could be consumed -- the orchestrator's parallelism was
+silently serialized.
+
+These tests prove:
+  1. The blocking work now runs on a worker thread (not the main/event-loop
+     thread) -- i.e. it is offloaded via ``asyncio.to_thread``.
+  2. The event loop stays responsive: another coroutine makes progress while the
+     copy is in flight.
+  3. Functional behavior is preserved byte-for-byte (anon-id dirs, excluded
+     framework metadata dirs, content copied).
+
+No external/LLM calls; uses real temp directories.
+"""
+
+from __future__ import annotations
+
+import asyncio
+import threading
+from pathlib import Path
+
+import pytest
+
+from massgen.filesystem_manager._filesystem_manager import FilesystemManager
+
+
+def _make_fm(temp_workspace: Path, monkeypatch) -> FilesystemManager:
+    """Build a minimal FilesystemManager exercising only the copy path."""
+    fm = FilesystemManager.__new__(FilesystemManager)
+    fm.agent_temporary_workspace = temp_workspace
+    # The media-ledger normalization needs unrelated state; no-op it for isolation.
+    monkeypatch.setattr(fm, "_normalize_media_call_ledger_paths", lambda **kwargs: None)
+    return fm
+
+
+def _seed_snapshot(root: Path, name: str) -> Path:
+    snap = root / name
+    (snap / "src").mkdir(parents=True)
+    (snap / "src" / "main.py").write_text("print('hi')\n")
+    (snap / "answer.md").write_text("# answer\n")
+    # framework metadata that must be excluded from the temp copy
+    (snap / ".git").mkdir()
+    (snap / ".git" / "HEAD").write_text("ref: refs/heads/main\n")
+    (snap / ".massgen").mkdir()
+    (snap / ".massgen" / "state.json").write_text("{}")
+    return snap
+
+
+@pytest.mark.asyncio
+async def test_copy_runs_off_event_loop_thread(tmp_path, monkeypatch) -> None:
+    """The blocking copy executes on a worker thread, not the main thread."""
+    src_root = tmp_path / "snaps"
+    src_root.mkdir()
+    snap = _seed_snapshot(src_root, "agentA")
+
+    temp_ws = tmp_path / "temp_ws"
+    fm = _make_fm(temp_ws, monkeypatch)
+
+    recorded: dict[str, threading.Thread] = {}
+    import massgen.filesystem_manager._filesystem_manager as fsm
+
+    real_copytree = fsm.shutil.copytree
+
+    def _spy_copytree(*args, **kwargs):
+        recorded["thread"] = threading.current_thread()
+        return real_copytree(*args, **kwargs)
+
+    monkeypatch.setattr(fsm.shutil, "copytree", _spy_copytree)
+
+    result = await fm.copy_snapshots_to_temp_workspace({"agentA": snap}, {"agentA": "agent_1"})
+
+    assert result == temp_ws
+    assert "thread" in recorded, "copytree never ran"
+    assert recorded["thread"] is not threading.main_thread(), "copy ran on the event-loop thread (not offloaded)"
+
+
+@pytest.mark.asyncio
+async def test_event_loop_stays_responsive_during_copy(tmp_path, monkeypatch) -> None:
+    """A concurrent coroutine makes progress while the copy is blocking a worker thread."""
+    src_root = tmp_path / "snaps"
+    src_root.mkdir()
+    snap = _seed_snapshot(src_root, "agentA")
+    fm = _make_fm(tmp_path / "temp_ws", monkeypatch)
+
+    import massgen.filesystem_manager._filesystem_manager as fsm
+
+    real_copytree = fsm.shutil.copytree
+    copy_started = threading.Event()
+    release_copy = threading.Event()
+
+    def _slow_copytree(*args, **kwargs):
+        copy_started.set()
+        # Block the worker thread; the event loop must keep running.
+        release_copy.wait(timeout=5)
+        return real_copytree(*args, **kwargs)
+
+    monkeypatch.setattr(fsm.shutil, "copytree", _slow_copytree)
+
+    ticks = 0
+
+    async def _ticker():
+        nonlocal ticks
+        # Wait until the copy is actually mid-flight on the worker thread...
+        while not copy_started.is_set():
+            await asyncio.sleep(0.001)
+        # ...then prove the loop still schedules us several times while it blocks.
+        for _ in range(5):
+            ticks += 1
+            await asyncio.sleep(0.001)
+        release_copy.set()
+
+    copy_task = asyncio.create_task(fm.copy_snapshots_to_temp_workspace({"agentA": snap}, {"agentA": "agent_1"}))
+    await asyncio.gather(copy_task, _ticker())
+
+    assert ticks == 5, "event loop was blocked during the snapshot copy"
+
+
+@pytest.mark.asyncio
+async def test_copy_preserves_content_and_excludes_metadata(tmp_path, monkeypatch) -> None:
+    """Anon-id destination, real content copied, framework metadata excluded."""
+    src_root = tmp_path / "snaps"
+    src_root.mkdir()
+    snap = _seed_snapshot(src_root, "agentA")
+    temp_ws = tmp_path / "temp_ws"
+    fm = _make_fm(temp_ws, monkeypatch)
+
+    await fm.copy_snapshots_to_temp_workspace({"agentA": snap}, {"agentA": "agent_1"})
+
+    dest = temp_ws / "agent_1"
+    assert (dest / "src" / "main.py").read_text() == "print('hi')\n"
+    assert (dest / "answer.md").exists()
+    assert not (dest / ".git").exists(), ".git must be excluded from temp copy"
+    assert not (dest / ".massgen").exists(), ".massgen must be excluded from temp copy"
+
+
+@pytest.mark.asyncio
+async def test_copy_clears_stale_temp_workspace(tmp_path, monkeypatch) -> None:
+    """A stale file from a prior round is removed before the new copy."""
+    src_root = tmp_path / "snaps"
+    src_root.mkdir()
+    snap = _seed_snapshot(src_root, "agentA")
+    temp_ws = tmp_path / "temp_ws"
+    temp_ws.mkdir()
+    (temp_ws / "stale.txt").write_text("old round")
+
+    fm = _make_fm(temp_ws, monkeypatch)
+    await fm.copy_snapshots_to_temp_workspace({"agentA": snap}, {"agentA": "agent_1"})
+
+    assert not (temp_ws / "stale.txt").exists()
+    assert (temp_ws / "agent_1" / "answer.md").exists()
+
+
+@pytest.mark.asyncio
+async def test_copy_returns_none_without_temp_workspace(monkeypatch) -> None:
+    fm = FilesystemManager.__new__(FilesystemManager)
+    fm.agent_temporary_workspace = None
+    result = await fm.copy_snapshots_to_temp_workspace({}, {})
+    assert result is None

From 51a6d0f103007baa548b3b1f82c87d43d01dc87a Mon Sep 17 00:00:00 2001
From: ncrispino <ncrispino@wustl.edu>
Date: Thu, 4 Jun 2026 15:05:07 -0700
Subject: [PATCH 2/4] refactor: finish MidStreamInjectionHookInstaller
 extraction (Step 29)

Move the 5 remaining hook-installation methods out of orchestrator.py into the
MidStreamInjectionHookInstaller collaborator, completing the roadmap's deferred
Step 29. A1 (prior commit) was the callback-unification prerequisite; with the
two injection closures already unified, these now extract cleanly:

- _setup_hook_manager_for_agent (dispatcher)
- _setup_native_hooks_for_agent
- _setup_codex_mcp_hooks
- _setup_codex_hybrid_hooks
- _register_round_timeout_hooks

Each becomes a thin orchestrator delegator. Cross-method calls between the moved
methods route through orch._<delegator> to preserve test monkeypatch-safety.
_codex_mcp_hook_agents is still written on the orchestrator (a test reads it there).

orchestrator.py 8,422 -> 7,910 lines. Removed the now-unused hook imports.

Validated: hooks/restart-and-external-tools/broadcast-subagents integration suites,
the 37-test characterization safety net, and the injection/decomposition/novelty
suites all green. Only pre-existing test_subagent_round_timeouts FileNotFound
failures remain (unrelated, confirmed identical on baseline).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 .../orchestrator_refactor_roadmap.md          |   4 +-
 massgen/orchestrator.py                       | 542 +-----------------
 .../midstream_injection_hook_installer.py     | 536 +++++++++++++++++
 3 files changed, 548 insertions(+), 534 deletions(-)

diff --git a/docs/dev_notes/orchestrator_refactor_roadmap.md b/docs/dev_notes/orchestrator_refactor_roadmap.md
index 3ccefa3a4..decd06a65 100644
--- a/docs/dev_notes/orchestrator_refactor_roadmap.md
+++ b/docs/dev_notes/orchestrator_refactor_roadmap.md
@@ -63,7 +63,7 @@
 - [x] Step 28 (FinalPresentationRunner-5 methods, 1302 lines) — **DONE & verified green**. orchestrator.py → 15,021. 1 regression fixed: inner `self.get_final_presentation(...)` bypassed test monkeypatches on `orchestrator.get_final_presentation` → repointed to `orch.get_final_presentation(...)`.
 - [x] Step 23 (PeerAnswerVisibilityTracker-13 methods) — **DONE & verified green** (first batch with zero verifier issues). orchestrator.py → 14,866. Dual-writer field `pending_checklist_recheck_labels` mutated via orch back-ref so ChecklistGateManager later sees the same live set.
 - [x] Step 24 (ChecklistGateManager-11 methods, 1252 lines) — **DONE & verified green** (largest single extraction; 219 tests passed across checklist/criteria/round_evaluator suites). 1 regression fixed: collaborator's `resolve_effective_checklist_criteria` called `self.get_active_criteria(...)` directly → bypassed test monkeypatches on `orchestrator._get_active_criteria` → repointed to `orch._get_active_criteria(...)`.
-- [x] Step 29 (MidStreamInjectionHookInstaller — partial: 6 of 18 pure helpers extracted, 310 lines). The 12 hook-installation methods with duplicated callback closures across 3 backend paths (`_setup_hook_manager_for_agent`, `_setup_codex_mcp_hooks`, `_setup_codex_hybrid_hooks`, `_setup_native_hooks_for_agent`, `_register_round_timeout_hooks`, etc.) remain on Orchestrator — they need a callback-unification pass first (behavior-changing, out of scope for pure extraction).
+- [x] Step 29 (MidStreamInjectionHookInstaller) — **COMPLETE** (2026-06-04). First A1 unified the two duplicated `get_injection_content` closures into `build_midstream_injection(..., native=)` (the callback-unification pass the prior note said was needed), then the 5 hook-installation methods (`_setup_hook_manager_for_agent`, `_setup_codex_mcp_hooks`, `_setup_codex_hybrid_hooks`, `_setup_native_hooks_for_agent`, `_register_round_timeout_hooks`) were moved to the collaborator with thin orchestrator delegators. Cross-method calls route through `orch._delegator` for monkeypatch-safety. orchestrator.py 8,561 → 7,910 over the A1+extraction work. Validated by the hooks/restart/broadcast integration suites + characterization.
 - [x] Step 27 (AgentOrchestrationSetup) — **DONE** (corrected 2026-06-04 by eng-health audit). The collaborator now lives at `massgen/orchestrator_collaborators/agent_orchestration_setup.py`; only a thin per-agent delegator loop remains inline in `__init__`. (The earlier "skipped" status was stale — the `__init__`-rewiring was completed.)
 
 ## Release status — SHIPPED (50% milestone)
@@ -86,7 +86,7 @@ After completing the original plan, identified and extracted 4 more cohesive clu
 ## Follow-up release work
 
 - **AgentOrchestrationSetup**: DONE — extracted to its own collaborator; only a thin `__init__` delegator loop remains (optional future hoist).
-- **MidStreamInjectionHookInstaller**: collaborator exists (`midstream_injection_hook_installer.py`, holds the pure helpers). The remaining work is the duplicated `get_injection_content` closures across the 3 backend paths — a callback-unification pass (audit item A1), tracked in `next_version_eng_health_plan.md`.
+- **MidStreamInjectionHookInstaller**: DONE — closures unified (A1) and all 5 hook-install methods moved into the collaborator (2026-06-04).
 - **MidStreamInjectionHookInstaller remaining 12 methods**: unify the duplicated `get_injection_content` closures across the 3 backend paths (behavior-changing, needs its own validation), THEN extract.
 - These are not blocking — Orchestrator is now 38% smaller and the remaining bulk is the 4 hook-install paths + the streaming-loop core.
 
diff --git a/massgen/orchestrator.py b/massgen/orchestrator.py
index f15161145..8785f30f9 100644
--- a/massgen/orchestrator.py
+++ b/massgen/orchestrator.py
@@ -59,14 +59,7 @@
 from .mcp_tools.hooks import (
     BackgroundToolCompleteHook,
     GeneralHookManager,
-    HighPriorityTaskReminderHook,
-    HookType,
     HumanInputHook,
-    MediaCallLedgerHook,
-    MidStreamInjectionHook,
-    PythonCallableHook,
-    RoundTimeoutPostHook,
-    RoundTimeoutPreHook,
     RoundTimeoutState,
     SubagentCompleteHook,
 )
@@ -3482,171 +3475,8 @@ def _setup_hook_manager_for_agent(
         agent: ChatAgent,
         answers: dict[str, str],
     ) -> None:
-        """Set up hooks for agent - uses native adapter for Claude Code, GeneralHookManager for others.
-
-        This routes hook setup based on backend capabilities:
-        - Backends with native hook support (Claude Code): Use NativeHookAdapter
-        - Standard backends: Use GeneralHookManager
-
-        Both paths set up the same hooks:
-        1. MidStreamInjectionHook - injects answers from other agents into tool results
-        2. HighPriorityTaskReminderHook - reminds to document high-priority task completions (round mode only)
-
-        Args:
-            agent_id: The agent identifier
-            agent: The ChatAgent instance
-            answers: Dict of existing answers when agent started (used to detect new answers)
-        """
-        # Runtime human input must work for all backends, including those that
-        # don't support hook registration (hookless fallback / Codex path).
-        self._ensure_runtime_human_input_hook_initialized()
-        self._ensure_runtime_inbox_poller_initialized()
-
-        backend = getattr(agent, "backend", None)
-        backend_provider = backend.get_provider_name() if backend and hasattr(backend, "get_provider_name") else ""
-
-        # Codex uses a hybrid path: native Bash hooks plus MCP/file-based payload delivery.
-        if (
-            backend_provider == "codex"
-            and hasattr(agent.backend, "supports_native_hooks")
-            and agent.backend.supports_native_hooks()
-            and hasattr(agent.backend, "supports_mcp_server_hooks")
-            and agent.backend.supports_mcp_server_hooks()
-        ):
-            self._setup_codex_hybrid_hooks(agent_id, agent, answers)
-            return
-
-        # Check if backend supports native hooks (e.g., Claude Code)
-        if hasattr(agent.backend, "supports_native_hooks") and agent.backend.supports_native_hooks():
-            self._setup_native_hooks_for_agent(agent_id, agent, answers)
-            return
-
-        # Check if backend supports MCP server-level hooks (e.g., Codex)
-        if hasattr(agent.backend, "supports_mcp_server_hooks") and agent.backend.supports_mcp_server_hooks():
-            self._setup_codex_mcp_hooks(agent_id, agent, answers)
-            return
-
-        # Fall back to GeneralHookManager for standard backends
-        if not hasattr(agent.backend, "set_general_hook_manager"):
-            return
-
-        # Create hook manager
-        manager = GeneralHookManager()
-
-        # Create mid-stream injection hook with closure-based callback
-        mid_stream_hook = MidStreamInjectionHook()
-
-        # Define the injection callback (captures agent_id and answers).
-        # A1: both the GeneralHookManager and native paths route through the
-        # single unified installer method so they can no longer drift.
-        async def get_injection_content() -> str | None:
-            return await self._build_midstream_injection(agent_id, answers, native=False)
-
-        # Set callback on hook
-        mid_stream_hook.set_callback(get_injection_content)
-
-        # Register mid-stream injection hook first (maintains current behavior order)
-        manager.register_global_hook(HookType.POST_TOOL_USE, mid_stream_hook)
-
-        # Register high-priority task reminder hook (enabled round-wise when capture is enabled)
-        if self._is_round_learning_capture_enabled():
-            reminder_hook = HighPriorityTaskReminderHook()
-            manager.register_global_hook(HookType.POST_TOOL_USE, reminder_hook)
-
-        # Register media call ledger hook (read_media/generate_media provenance capture)
-        manager.register_global_hook(HookType.POST_TOOL_USE, MediaCallLedgerHook())
-
-        # Register human input hook (shared across all agents)
-        manager.register_global_hook(HookType.POST_TOOL_USE, self._human_input_hook)
-
-        # Register subagent completion hook for background result injection
-        if self._background_subagents_enabled:
-            subagent_hook = SubagentCompleteHook(
-                injection_strategy=self._background_subagent_injection_strategy,
-            )
-
-            # Create a closure that captures agent_id for pending results retrieval
-            def make_pending_getter(aid: str):
-                return lambda: self._get_pending_subagent_results_async(aid)
-
-            subagent_hook.set_pending_results_getter(make_pending_getter(agent_id))
-            manager.register_global_hook(HookType.POST_TOOL_USE, subagent_hook)
-            logger.debug(f"[Orchestrator] Registered SubagentCompleteHook for {agent_id}")
-
-            # Wire background tool delegate so list/status/result/cancel route to subagents
-            if hasattr(agent.backend, "register_background_delegate"):
-                from massgen.subagent.background_delegate import (
-                    SubagentBackgroundDelegate,
-                )
-
-                def _make_call_tool(aid: str):
-                    return lambda tool_name, params: self._call_subagent_mcp_tool_async(
-                        aid,
-                        tool_name,
-                        params,
-                    )
-
-                delegate = SubagentBackgroundDelegate(
-                    call_tool=_make_call_tool(agent_id),
-                    agent_id=agent_id,
-                )
-                agent.backend.register_background_delegate(delegate)
-                logger.debug(f"[Orchestrator] Registered SubagentBackgroundDelegate for {agent_id}")
-
-        # Register background tool completion hook for async tool result injection
-        if hasattr(agent.backend, "get_pending_background_tool_results"):
-            background_tool_hook = BackgroundToolCompleteHook()
-            background_tool_hook.set_completed_jobs_getter(
-                agent.backend.get_pending_background_tool_results,
-            )
-            manager.register_global_hook(HookType.POST_TOOL_USE, background_tool_hook)
-            logger.debug(
-                f"[Orchestrator] Registered BackgroundToolCompleteHook for {agent_id}",
-            )
-        # Register per-round timeout hooks if configured
-        self._register_round_timeout_hooks(agent_id, manager)
-
-        # Register user-configured hooks from agent backend config
-        if hasattr(agent.backend, "config") and agent.backend.config:
-            agent_hooks = agent.backend.config.get("hooks")
-            if agent_hooks:
-                manager.register_hooks_from_config(agent_hooks, agent_id=agent_id)
-                logger.debug(
-                    f"[Orchestrator] Registered user-configured hooks for {agent_id}",
-                )
-
-        # Set manager on backend
-        agent.backend.set_general_hook_manager(manager)
-        if hasattr(agent.backend, "set_background_wait_interrupt_provider"):
-
-            async def _wait_interrupt_provider(
-                requested_agent_id: str,
-                *,
-                _agent_id: str = agent_id,
-            ) -> dict[str, Any] | None:
-                target_agent_id = requested_agent_id or _agent_id
-                if hasattr(self, "cancellation_manager") and self.cancellation_manager and self.cancellation_manager.is_cancelled:
-                    return {
-                        "interrupt_reason": "turn_cancelled",
-                        "injected_content": None,
-                    }
-
-                runtime_sections = await self._collect_no_hook_runtime_fallback_sections(
-                    target_agent_id,
-                )
-                if not runtime_sections:
-                    return None
-                return {
-                    "interrupt_reason": "runtime_injection_available",
-                    "injected_content": "\n".join(runtime_sections),
-                }
-
-            agent.backend.set_background_wait_interrupt_provider(
-                _wait_interrupt_provider,
-            )
-        logger.debug(
-            f"[Orchestrator] Set up hook manager for {agent_id} with mid-stream and reminder hooks",
-        )
+        """Route hook setup by backend capability (delegates to MidStreamInjectionHookInstaller)."""
+        self._midstream_injection_hook_installer.setup_hook_manager_for_agent(agent_id, agent, answers)
 
     def _setup_codex_mcp_hooks(
         self,
@@ -3654,54 +3484,8 @@ def _setup_codex_mcp_hooks(
         agent: ChatAgent,
         answers: dict[str, str],
     ) -> None:
-        """Set up MCP server-level hook delivery for Codex backends.
-
-        Instead of registering hooks on a GeneralHookManager, this stores a
-        reference so the streaming loop can call _flush_codex_hook_payloads()
-        to write injection files that the MCP middleware consumes.
-        """
-        # Mark this agent as using MCP server hooks
-        if not hasattr(self, "_codex_mcp_hook_agents"):
-            self._codex_mcp_hook_agents: dict[str, dict[str, Any]] = {}
-
-        self._codex_mcp_hook_agents[agent_id] = {
-            "agent": agent,
-            "answers": answers,
-        }
-
-        # Set up the background wait interrupt provider (reuse existing pattern)
-        if hasattr(agent.backend, "set_background_wait_interrupt_provider"):
-
-            async def _wait_interrupt_provider(
-                requested_agent_id: str,
-                *,
-                _agent_id: str = agent_id,
-            ) -> dict[str, Any] | None:
-                target_agent_id = requested_agent_id or _agent_id
-                if hasattr(self, "cancellation_manager") and self.cancellation_manager and self.cancellation_manager.is_cancelled:
-                    return {
-                        "interrupt_reason": "turn_cancelled",
-                        "injected_content": None,
-                    }
-
-                runtime_sections = await self._collect_no_hook_runtime_fallback_sections(
-                    target_agent_id,
-                )
-                if not runtime_sections:
-                    return None
-                return {
-                    "interrupt_reason": "runtime_injection_available",
-                    "injected_content": "\n".join(runtime_sections),
-                }
-
-            agent.backend.set_background_wait_interrupt_provider(
-                _wait_interrupt_provider,
-            )
-
-        logger.info(
-            "[Orchestrator] Set up MCP server-level hook delivery for %s",
-            agent_id,
-        )
+        """Set up Codex MCP server-level hook delivery (delegates to MidStreamInjectionHookInstaller)."""
+        self._midstream_injection_hook_installer.setup_codex_mcp_hooks(agent_id, agent, answers)
 
     def _setup_codex_hybrid_hooks(
         self,
@@ -3709,53 +3493,8 @@ def _setup_codex_hybrid_hooks(
         agent: ChatAgent,
         answers: dict[str, str],
     ) -> None:
-        """Set up Codex's hybrid delivery path.
-
-        Codex native hooks currently cover Bash-only ``PreToolUse`` and
-        ``PostToolUse``. MassGen runtime payload delivery still flows through the
-        shared ``hook_post_tool_use.json`` file and the MCP/file carry-forward
-        path, so we register a lightweight native Bash bridge and keep the
-        existing Codex MCP setup in place.
-        """
-        adapter = agent.backend.get_native_hook_adapter()
-        if not adapter:
-            logger.warning(
-                "[Orchestrator] Codex backend reported native hooks but no adapter was available for %s",
-                agent_id,
-            )
-            self._setup_codex_mcp_hooks(agent_id, agent, answers)
-            return
-
-        manager = GeneralHookManager()
-        manager.register_global_hook(
-            HookType.POST_TOOL_USE,
-            PythonCallableHook(
-                name="codex_post_tool_bridge",
-                handler=lambda _event: None,
-                matcher="Bash",
-            ),
-        )
-
-        native_config = adapter.build_native_hooks_config(
-            manager,
-            agent_id=agent_id,
-        )
-        agent.backend.set_native_hooks_config(native_config)
-        # Codex's native hook surface is Bash-only, but the TUI and manual
-        # wrap-up flow still need real timeout hook objects in agent state.
-        # Register those separately so request_answer_now() and timeout status
-        # work on the hybrid path without changing the native hooks config.
-        timeout_manager = GeneralHookManager()
-        self._register_round_timeout_hooks(agent_id, timeout_manager)
-        self._setup_codex_mcp_hooks(agent_id, agent, answers)
-
-        hooks = native_config.get("hooks", {}) if isinstance(native_config, dict) else {}
-        logger.info(
-            "[Orchestrator] Set up Codex hybrid hooks for %s: PreToolUse=%d, PostToolUse=%d",
-            agent_id,
-            len(hooks.get("PreToolUse", [])),
-            len(hooks.get("PostToolUse", [])),
-        )
+        """Set up Codex's hybrid delivery path (delegates to MidStreamInjectionHookInstaller)."""
+        self._midstream_injection_hook_installer.setup_codex_hybrid_hooks(agent_id, agent, answers)
 
     async def _collect_round_timeout_runtime_sections(
         self,
@@ -4470,97 +4209,8 @@ def _register_round_timeout_hooks(
         agent_id: str,
         manager: GeneralHookManager,
     ) -> None:
-        """Register per-round timeout hooks if configured.
-
-        This creates two hooks:
-        1. RoundTimeoutPostHook (soft timeout) - Injects warning message after tool calls
-        2. RoundTimeoutPreHook (hard timeout) - Blocks non-terminal tools after grace period
-
-        The hooks are stored in agent_states so they can be reset when a new round starts.
-
-        Args:
-            agent_id: The agent identifier
-            manager: The GeneralHookManager to register hooks with
-        """
-        # Get timeout config
-        timeout_config = self.config.timeout_config
-        initial_timeout = timeout_config.initial_round_timeout_seconds
-        subsequent_timeout = timeout_config.subsequent_round_timeout_seconds
-        grace_seconds = timeout_config.round_timeout_grace_seconds
-
-        # Skip if no round timeouts configured
-        if initial_timeout is None and subsequent_timeout is None:
-            return
-
-        logger.info(
-            f"[Orchestrator] Registering round timeout hooks for {agent_id}: " f"initial={initial_timeout}s, subsequent={subsequent_timeout}s, grace={grace_seconds}s",
-        )
-
-        # Create closures that read from agent state
-        def get_round_start_time() -> float:
-            """Get the current round start time from agent state."""
-            start_time = self.agent_states[agent_id].round_start_time
-            if start_time is None:
-                # Fallback to current time if not set (shouldn't happen)
-                logger.warning(
-                    f"[Orchestrator] round_start_time is None for {agent_id}, using current time as fallback",
-                )
-                return time.time()
-            return start_time
-
-        def get_agent_round() -> int:
-            """Get the current round number from coordination tracker."""
-            return self.coordination_tracker.get_agent_round(agent_id)
-
-        # Create shared state for coordinating soft -> hard timeout progression
-        # This ensures hard timeout only fires AFTER soft timeout has been injected
-        timeout_state = RoundTimeoutState()
-
-        # Get two-tier workspace setting from coordination config
-        # Suppressed when write_mode is active (write_mode replaces the old two-tier structure)
-        coordination_config = getattr(self.config, "coordination_config", None)
-        write_mode = getattr(coordination_config, "write_mode", None) if coordination_config else None
-        use_two_tier_workspace = False
-        if not (write_mode and write_mode != "legacy"):
-            use_two_tier_workspace = bool(
-                getattr(coordination_config, "use_two_tier_workspace", False),
-            )
-
-        # Create soft timeout hook (POST_TOOL_USE - injects warning)
-        post_hook = RoundTimeoutPostHook(
-            name=f"round_timeout_soft_{agent_id}",
-            get_round_start_time=get_round_start_time,
-            get_agent_round=get_agent_round,
-            initial_timeout_seconds=initial_timeout,
-            subsequent_timeout_seconds=subsequent_timeout,
-            grace_seconds=grace_seconds,
-            agent_id=agent_id,
-            shared_state=timeout_state,
-            use_two_tier_workspace=use_two_tier_workspace,
-        )
-
-        # Create hard timeout hook (PRE_TOOL_USE - blocks non-terminal tools)
-        pre_hook = RoundTimeoutPreHook(
-            name=f"round_timeout_hard_{agent_id}",
-            get_round_start_time=get_round_start_time,
-            get_agent_round=get_agent_round,
-            initial_timeout_seconds=initial_timeout,
-            subsequent_timeout_seconds=subsequent_timeout,
-            grace_seconds=grace_seconds,
-            agent_id=agent_id,
-            shared_state=timeout_state,
-        )
-
-        # Register hooks
-        manager.register_global_hook(HookType.POST_TOOL_USE, post_hook)
-        manager.register_global_hook(HookType.PRE_TOOL_USE, pre_hook)
-
-        # Store hook references so we can reset them on new rounds
-        self.agent_states[agent_id].round_timeout_hooks = (post_hook, pre_hook)
-        # Store the shared state so we can check force_terminate in the orchestrator loop
-        self.agent_states[agent_id].round_timeout_state = timeout_state
-
-        logger.debug(f"[Orchestrator] Registered round timeout hooks for {agent_id}")
+        """Register per-round timeout hooks (delegates to MidStreamInjectionHookInstaller)."""
+        self._midstream_injection_hook_installer.register_round_timeout_hooks(agent_id, manager)
 
     def _setup_native_hooks_for_agent(
         self,
@@ -4568,180 +4218,8 @@ def _setup_native_hooks_for_agent(
         agent: ChatAgent,
         answers: dict[str, str],
     ) -> None:
-        """Set up native hooks for backends that support them (e.g., Claude Code).
-
-        This converts MassGen hooks to the backend's native format using the
-        NativeHookAdapter interface. The hooks are then executed natively by
-        the backend rather than through MassGen's GeneralHookManager.
-
-        Args:
-            agent_id: The agent identifier
-            agent: The ChatAgent instance
-            answers: Dict of existing answers when agent started (used to detect new answers)
-        """
-        # Get the native hook adapter from the backend
-        adapter = agent.backend.get_native_hook_adapter()
-        if not adapter:
-            logger.warning(
-                f"[Orchestrator] Backend supports native hooks but adapter unavailable for {agent_id}",
-            )
-            return
-
-        # Create a GeneralHookManager to hold MassGen hooks
-        # (We'll convert these to native format)
-        manager = GeneralHookManager()
-
-        # Create mid-stream injection hook with closure-based callback
-        mid_stream_hook = MidStreamInjectionHook()
-
-        # Define the injection callback (A1: unified with the GeneralHookManager
-        # path via the single installer method; native=True only changes log
-        # wording and the non-native debug listing).
-        async def get_injection_content() -> str | None:
-            return await self._build_midstream_injection(agent_id, answers, native=True)
-
-        # Set callback on hook
-        mid_stream_hook.set_callback(get_injection_content)
-
-        # Register mid-stream injection hook
-        manager.register_global_hook(HookType.POST_TOOL_USE, mid_stream_hook)
-
-        # Register high-priority task reminder hook (enabled round-wise when capture is enabled)
-        if self._is_round_learning_capture_enabled():
-            reminder_hook = HighPriorityTaskReminderHook()
-            manager.register_global_hook(HookType.POST_TOOL_USE, reminder_hook)
-
-        # Register media call ledger hook (read_media/generate_media provenance capture)
-        manager.register_global_hook(HookType.POST_TOOL_USE, MediaCallLedgerHook())
-
-        # Register human input hook (shared across all agents)
-        self._ensure_runtime_human_input_hook_initialized()
-        manager.register_global_hook(HookType.POST_TOOL_USE, self._human_input_hook)
-
-        # Register subagent completion hook for background result injection
-        if self._background_subagents_enabled:
-            subagent_hook = SubagentCompleteHook(
-                injection_strategy=self._background_subagent_injection_strategy,
-            )
-
-            # Create a closure that captures agent_id for pending results retrieval
-            def make_pending_getter(aid: str):
-                return lambda: self._get_pending_subagent_results_async(aid)
-
-            subagent_hook.set_pending_results_getter(make_pending_getter(agent_id))
-            manager.register_global_hook(HookType.POST_TOOL_USE, subagent_hook)
-            logger.debug(f"[Orchestrator] Registered SubagentCompleteHook (native) for {agent_id}")
-
-            # Wire background tool delegate so list/status/result/cancel route to subagents
-            if hasattr(agent.backend, "register_background_delegate"):
-                from massgen.subagent.background_delegate import (
-                    SubagentBackgroundDelegate,
-                )
-
-                def _make_call_tool(aid: str):
-                    return lambda tool_name, params: self._call_subagent_mcp_tool_async(
-                        aid,
-                        tool_name,
-                        params,
-                    )
-
-                delegate = SubagentBackgroundDelegate(
-                    call_tool=_make_call_tool(agent_id),
-                    agent_id=agent_id,
-                )
-                agent.backend.register_background_delegate(delegate)
-                logger.debug(f"[Orchestrator] Registered SubagentBackgroundDelegate (native) for {agent_id}")
-
-        # Register background tool completion hook for async tool result injection
-        if hasattr(agent.backend, "get_pending_background_tool_results"):
-            background_tool_hook = BackgroundToolCompleteHook()
-            background_tool_hook.set_completed_jobs_getter(
-                agent.backend.get_pending_background_tool_results,
-            )
-            manager.register_global_hook(HookType.POST_TOOL_USE, background_tool_hook)
-            logger.debug(
-                f"[Orchestrator] Registered BackgroundToolCompleteHook (native) for {agent_id}",
-            )
-        # Register per-round timeout hooks if configured
-        self._register_round_timeout_hooks(agent_id, manager)
-
-        # Register user-configured hooks from agent backend config
-        agent_hooks = agent.backend.config.get("hooks")
-        if agent_hooks:
-            manager.register_hooks_from_config(agent_hooks, agent_id=agent_id)
-
-        # Register PathPermissionManagerHook for PRE_TOOL_USE validation.
-        # Native backends like Copilot need MassGen-level path validation.
-        # Claude Code already handles permissions via add_dirs, so skip it.
-        backend_provider = agent.backend.get_provider_name() if hasattr(agent.backend, "get_provider_name") else ""
-        if backend_provider != "claude_code":
-            _fm = getattr(agent.backend, "filesystem_manager", None)
-            if _fm:
-                _ppm = getattr(_fm, "path_permission_manager", None)
-                if _ppm:
-                    from massgen.filesystem_manager import PathPermissionManagerHook
-
-                    ppm_hook = PathPermissionManagerHook(_ppm)
-                    manager.register_global_hook(HookType.PRE_TOOL_USE, ppm_hook)
-                    logger.debug(
-                        f"[Orchestrator] Registered PathPermissionManagerHook (PRE_TOOL_USE) for {agent_id}",
-                    )
-
-        # Create context factory for hooks
-        def context_factory() -> dict[str, Any]:
-            workspace_path = None
-            filesystem_manager = getattr(agent.backend, "filesystem_manager", None)
-            if filesystem_manager and hasattr(filesystem_manager, "get_current_workspace"):
-                try:
-                    workspace_path = str(filesystem_manager.get_current_workspace())
-                except Exception:
-                    workspace_path = None
-            return {
-                "session_id": getattr(self, "session_id", ""),
-                "orchestrator_id": getattr(self, "orchestrator_id", ""),
-                "agent_id": agent_id,
-                "workspace_path": workspace_path,
-            }
-
-        # Convert to native format using adapter
-        native_config = adapter.build_native_hooks_config(
-            manager,
-            agent_id=agent_id,
-            context_factory=context_factory,
-        )
-
-        # Set native hooks config on backend
-        agent.backend.set_native_hooks_config(native_config)
-        if hasattr(agent.backend, "set_background_wait_interrupt_provider"):
-
-            async def _wait_interrupt_provider(
-                requested_agent_id: str,
-                *,
-                _agent_id: str = agent_id,
-            ) -> dict[str, Any] | None:
-                target_agent_id = requested_agent_id or _agent_id
-                if hasattr(self, "cancellation_manager") and self.cancellation_manager and self.cancellation_manager.is_cancelled:
-                    return {
-                        "interrupt_reason": "turn_cancelled",
-                        "injected_content": None,
-                    }
-
-                runtime_sections = await self._collect_no_hook_runtime_fallback_sections(
-                    target_agent_id,
-                )
-                if not runtime_sections:
-                    return None
-                return {
-                    "interrupt_reason": "runtime_injection_available",
-                    "injected_content": "\n".join(runtime_sections),
-                }
-
-            agent.backend.set_background_wait_interrupt_provider(
-                _wait_interrupt_provider,
-            )
-        logger.info(
-            f"[Orchestrator] Set up native hooks for {agent_id}: " f"PreToolUse={len(native_config.get('PreToolUse', []))}, " f"PostToolUse={len(native_config.get('PostToolUse', []))} hooks",
-        )
+        """Set up native hooks (delegates to MidStreamInjectionHookInstaller)."""
+        self._midstream_injection_hook_installer.setup_native_hooks_for_agent(agent_id, agent, answers)
 
     @classmethod
     def _coerce_answer_content_to_text(cls, content: Any) -> str:
diff --git a/massgen/orchestrator_collaborators/midstream_injection_hook_installer.py b/massgen/orchestrator_collaborators/midstream_injection_hook_installer.py
index ce4eeae6e..1a6d722f4 100644
--- a/massgen/orchestrator_collaborators/midstream_injection_hook_installer.py
+++ b/massgen/orchestrator_collaborators/midstream_injection_hook_installer.py
@@ -16,11 +16,25 @@
 
 import json
 import os
+import time
 from collections.abc import AsyncGenerator
 from pathlib import Path
 from typing import TYPE_CHECKING, Any
 
 from massgen.logger_config import get_event_emitter, logger
+from massgen.mcp_tools.hooks import (
+    BackgroundToolCompleteHook,
+    GeneralHookManager,
+    HighPriorityTaskReminderHook,
+    HookType,
+    MediaCallLedgerHook,
+    MidStreamInjectionHook,
+    PythonCallableHook,
+    RoundTimeoutPostHook,
+    RoundTimeoutPreHook,
+    RoundTimeoutState,
+    SubagentCompleteHook,
+)
 from massgen.utils import ActionType
 
 if TYPE_CHECKING:
@@ -144,6 +158,528 @@ def compute_plan_progress_stats(self, workspace_path: str) -> dict[str, Any] | N
             logger.debug(f"[Orchestrator] Could not compute plan progress: {e}")
             return None
 
+    def setup_hook_manager_for_agent(
+        self,
+        agent_id: str,
+        agent: Any,
+        answers: dict[str, str],
+    ) -> None:
+        """Route hook setup by backend capability (native / codex / GeneralHookManager).
+
+        Dispatcher: Claude-Code-style native backends use the native adapter,
+        Codex uses its hybrid or MCP path, everything else uses GeneralHookManager.
+        """
+        orch = self._orchestrator
+
+        # Runtime human input must work for all backends, including those that
+        # don't support hook registration (hookless fallback / Codex path).
+        orch._ensure_runtime_human_input_hook_initialized()
+        orch._ensure_runtime_inbox_poller_initialized()
+
+        backend = getattr(agent, "backend", None)
+        backend_provider = backend.get_provider_name() if backend and hasattr(backend, "get_provider_name") else ""
+
+        # Codex uses a hybrid path: native Bash hooks plus MCP/file payload delivery.
+        if (
+            backend_provider == "codex"
+            and hasattr(agent.backend, "supports_native_hooks")
+            and agent.backend.supports_native_hooks()
+            and hasattr(agent.backend, "supports_mcp_server_hooks")
+            and agent.backend.supports_mcp_server_hooks()
+        ):
+            orch._setup_codex_hybrid_hooks(agent_id, agent, answers)
+            return
+
+        # Native hooks (e.g., Claude Code)
+        if hasattr(agent.backend, "supports_native_hooks") and agent.backend.supports_native_hooks():
+            orch._setup_native_hooks_for_agent(agent_id, agent, answers)
+            return
+
+        # MCP server-level hooks (e.g., Codex)
+        if hasattr(agent.backend, "supports_mcp_server_hooks") and agent.backend.supports_mcp_server_hooks():
+            orch._setup_codex_mcp_hooks(agent_id, agent, answers)
+            return
+
+        # Fall back to GeneralHookManager for standard backends
+        if not hasattr(agent.backend, "set_general_hook_manager"):
+            return
+
+        manager = GeneralHookManager()
+        mid_stream_hook = MidStreamInjectionHook()
+
+        # A1: both the GeneralHookManager and native paths route through the
+        # single unified installer method so they can no longer drift.
+        async def get_injection_content() -> str | None:
+            return await orch._build_midstream_injection(agent_id, answers, native=False)
+
+        mid_stream_hook.set_callback(get_injection_content)
+
+        # Register mid-stream injection hook first (maintains current behavior order)
+        manager.register_global_hook(HookType.POST_TOOL_USE, mid_stream_hook)
+
+        if orch._is_round_learning_capture_enabled():
+            reminder_hook = HighPriorityTaskReminderHook()
+            manager.register_global_hook(HookType.POST_TOOL_USE, reminder_hook)
+
+        manager.register_global_hook(HookType.POST_TOOL_USE, MediaCallLedgerHook())
+
+        # Register human input hook (shared across all agents)
+        manager.register_global_hook(HookType.POST_TOOL_USE, orch._human_input_hook)
+
+        # Register subagent completion hook for background result injection
+        if orch._background_subagents_enabled:
+            subagent_hook = SubagentCompleteHook(
+                injection_strategy=orch._background_subagent_injection_strategy,
+            )
+
+            def make_pending_getter(aid: str):
+                return lambda: orch._get_pending_subagent_results_async(aid)
+
+            subagent_hook.set_pending_results_getter(make_pending_getter(agent_id))
+            manager.register_global_hook(HookType.POST_TOOL_USE, subagent_hook)
+            logger.debug(f"[Orchestrator] Registered SubagentCompleteHook for {agent_id}")
+
+            # Wire background tool delegate so list/status/result/cancel route to subagents
+            if hasattr(agent.backend, "register_background_delegate"):
+                from massgen.subagent.background_delegate import (
+                    SubagentBackgroundDelegate,
+                )
+
+                def _make_call_tool(aid: str):
+                    return lambda tool_name, params: orch._call_subagent_mcp_tool_async(
+                        aid,
+                        tool_name,
+                        params,
+                    )
+
+                delegate = SubagentBackgroundDelegate(
+                    call_tool=_make_call_tool(agent_id),
+                    agent_id=agent_id,
+                )
+                agent.backend.register_background_delegate(delegate)
+                logger.debug(f"[Orchestrator] Registered SubagentBackgroundDelegate for {agent_id}")
+
+        # Register background tool completion hook for async tool result injection
+        if hasattr(agent.backend, "get_pending_background_tool_results"):
+            background_tool_hook = BackgroundToolCompleteHook()
+            background_tool_hook.set_completed_jobs_getter(
+                agent.backend.get_pending_background_tool_results,
+            )
+            manager.register_global_hook(HookType.POST_TOOL_USE, background_tool_hook)
+            logger.debug(
+                f"[Orchestrator] Registered BackgroundToolCompleteHook for {agent_id}",
+            )
+        # Register per-round timeout hooks if configured
+        orch._register_round_timeout_hooks(agent_id, manager)
+
+        # Register user-configured hooks from agent backend config
+        if hasattr(agent.backend, "config") and agent.backend.config:
+            agent_hooks = agent.backend.config.get("hooks")
+            if agent_hooks:
+                manager.register_hooks_from_config(agent_hooks, agent_id=agent_id)
+                logger.debug(
+                    f"[Orchestrator] Registered user-configured hooks for {agent_id}",
+                )
+
+        # Set manager on backend
+        agent.backend.set_general_hook_manager(manager)
+        if hasattr(agent.backend, "set_background_wait_interrupt_provider"):
+
+            async def _wait_interrupt_provider(
+                requested_agent_id: str,
+                *,
+                _agent_id: str = agent_id,
+            ) -> dict[str, Any] | None:
+                target_agent_id = requested_agent_id or _agent_id
+                if hasattr(orch, "cancellation_manager") and orch.cancellation_manager and orch.cancellation_manager.is_cancelled:
+                    return {
+                        "interrupt_reason": "turn_cancelled",
+                        "injected_content": None,
+                    }
+
+                runtime_sections = await orch._collect_no_hook_runtime_fallback_sections(
+                    target_agent_id,
+                )
+                if not runtime_sections:
+                    return None
+                return {
+                    "interrupt_reason": "runtime_injection_available",
+                    "injected_content": "\n".join(runtime_sections),
+                }
+
+            agent.backend.set_background_wait_interrupt_provider(
+                _wait_interrupt_provider,
+            )
+        logger.debug(
+            f"[Orchestrator] Set up hook manager for {agent_id} with mid-stream and reminder hooks",
+        )
+
+    def setup_codex_mcp_hooks(
+        self,
+        agent_id: str,
+        agent: Any,
+        answers: dict[str, str],
+    ) -> None:
+        """Set up MCP server-level hook delivery for Codex backends.
+
+        Instead of registering hooks on a GeneralHookManager, this stores a
+        reference (on the orchestrator) so the streaming loop can call
+        ``_flush_codex_hook_payloads()`` to write injection files the MCP
+        middleware consumes.
+        """
+        orch = self._orchestrator
+
+        # Mark this agent as using MCP server hooks (stored on the orchestrator so
+        # the streaming loop and tests observe it there).
+        if not hasattr(orch, "_codex_mcp_hook_agents"):
+            orch._codex_mcp_hook_agents = {}
+
+        orch._codex_mcp_hook_agents[agent_id] = {
+            "agent": agent,
+            "answers": answers,
+        }
+
+        # Set up the background wait interrupt provider (reuse existing pattern)
+        if hasattr(agent.backend, "set_background_wait_interrupt_provider"):
+
+            async def _wait_interrupt_provider(
+                requested_agent_id: str,
+                *,
+                _agent_id: str = agent_id,
+            ) -> dict[str, Any] | None:
+                target_agent_id = requested_agent_id or _agent_id
+                if hasattr(orch, "cancellation_manager") and orch.cancellation_manager and orch.cancellation_manager.is_cancelled:
+                    return {
+                        "interrupt_reason": "turn_cancelled",
+                        "injected_content": None,
+                    }
+
+                runtime_sections = await orch._collect_no_hook_runtime_fallback_sections(
+                    target_agent_id,
+                )
+                if not runtime_sections:
+                    return None
+                return {
+                    "interrupt_reason": "runtime_injection_available",
+                    "injected_content": "\n".join(runtime_sections),
+                }
+
+            agent.backend.set_background_wait_interrupt_provider(
+                _wait_interrupt_provider,
+            )
+
+        logger.info(
+            "[Orchestrator] Set up MCP server-level hook delivery for %s",
+            agent_id,
+        )
+
+    def setup_codex_hybrid_hooks(
+        self,
+        agent_id: str,
+        agent: Any,
+        answers: dict[str, str],
+    ) -> None:
+        """Set up Codex's hybrid delivery path (native Bash bridge + MCP/file payloads)."""
+        orch = self._orchestrator
+
+        adapter = agent.backend.get_native_hook_adapter()
+        if not adapter:
+            logger.warning(
+                "[Orchestrator] Codex backend reported native hooks but no adapter was available for %s",
+                agent_id,
+            )
+            orch._setup_codex_mcp_hooks(agent_id, agent, answers)
+            return
+
+        manager = GeneralHookManager()
+        manager.register_global_hook(
+            HookType.POST_TOOL_USE,
+            PythonCallableHook(
+                name="codex_post_tool_bridge",
+                handler=lambda _event: None,
+                matcher="Bash",
+            ),
+        )
+
+        native_config = adapter.build_native_hooks_config(
+            manager,
+            agent_id=agent_id,
+        )
+        agent.backend.set_native_hooks_config(native_config)
+        # Codex's native hook surface is Bash-only, but the TUI and manual
+        # wrap-up flow still need real timeout hook objects in agent state.
+        timeout_manager = GeneralHookManager()
+        orch._register_round_timeout_hooks(agent_id, timeout_manager)
+        orch._setup_codex_mcp_hooks(agent_id, agent, answers)
+
+        hooks = native_config.get("hooks", {}) if isinstance(native_config, dict) else {}
+        logger.info(
+            "[Orchestrator] Set up Codex hybrid hooks for %s: PreToolUse=%d, PostToolUse=%d",
+            agent_id,
+            len(hooks.get("PreToolUse", [])),
+            len(hooks.get("PostToolUse", [])),
+        )
+
+    def setup_native_hooks_for_agent(
+        self,
+        agent_id: str,
+        agent: Any,
+        answers: dict[str, str],
+    ) -> None:
+        """Set up native hooks for backends that support them (e.g., Claude Code).
+
+        Converts MassGen hooks to the backend's native format via the
+        NativeHookAdapter; the backend then executes them natively rather than
+        through MassGen's GeneralHookManager.
+        """
+        orch = self._orchestrator
+
+        adapter = agent.backend.get_native_hook_adapter()
+        if not adapter:
+            logger.warning(
+                f"[Orchestrator] Backend supports native hooks but adapter unavailable for {agent_id}",
+            )
+            return
+
+        # Create a GeneralHookManager to hold MassGen hooks (converted to native).
+        manager = GeneralHookManager()
+
+        mid_stream_hook = MidStreamInjectionHook()
+
+        # A1: unified with the GeneralHookManager path via the single installer
+        # method; native=True only changes log wording and the debug listing.
+        async def get_injection_content() -> str | None:
+            return await orch._build_midstream_injection(agent_id, answers, native=True)
+
+        mid_stream_hook.set_callback(get_injection_content)
+        manager.register_global_hook(HookType.POST_TOOL_USE, mid_stream_hook)
+
+        if orch._is_round_learning_capture_enabled():
+            reminder_hook = HighPriorityTaskReminderHook()
+            manager.register_global_hook(HookType.POST_TOOL_USE, reminder_hook)
+
+        manager.register_global_hook(HookType.POST_TOOL_USE, MediaCallLedgerHook())
+
+        # Register human input hook (shared across all agents)
+        orch._ensure_runtime_human_input_hook_initialized()
+        manager.register_global_hook(HookType.POST_TOOL_USE, orch._human_input_hook)
+
+        # Register subagent completion hook for background result injection
+        if orch._background_subagents_enabled:
+            subagent_hook = SubagentCompleteHook(
+                injection_strategy=orch._background_subagent_injection_strategy,
+            )
+
+            def make_pending_getter(aid: str):
+                return lambda: orch._get_pending_subagent_results_async(aid)
+
+            subagent_hook.set_pending_results_getter(make_pending_getter(agent_id))
+            manager.register_global_hook(HookType.POST_TOOL_USE, subagent_hook)
+            logger.debug(f"[Orchestrator] Registered SubagentCompleteHook (native) for {agent_id}")
+
+            # Wire background tool delegate so list/status/result/cancel route to subagents
+            if hasattr(agent.backend, "register_background_delegate"):
+                from massgen.subagent.background_delegate import (
+                    SubagentBackgroundDelegate,
+                )
+
+                def _make_call_tool(aid: str):
+                    return lambda tool_name, params: orch._call_subagent_mcp_tool_async(
+                        aid,
+                        tool_name,
+                        params,
+                    )
+
+                delegate = SubagentBackgroundDelegate(
+                    call_tool=_make_call_tool(agent_id),
+                    agent_id=agent_id,
+                )
+                agent.backend.register_background_delegate(delegate)
+                logger.debug(f"[Orchestrator] Registered SubagentBackgroundDelegate (native) for {agent_id}")
+
+        # Register background tool completion hook for async tool result injection
+        if hasattr(agent.backend, "get_pending_background_tool_results"):
+            background_tool_hook = BackgroundToolCompleteHook()
+            background_tool_hook.set_completed_jobs_getter(
+                agent.backend.get_pending_background_tool_results,
+            )
+            manager.register_global_hook(HookType.POST_TOOL_USE, background_tool_hook)
+            logger.debug(
+                f"[Orchestrator] Registered BackgroundToolCompleteHook (native) for {agent_id}",
+            )
+        # Register per-round timeout hooks if configured
+        orch._register_round_timeout_hooks(agent_id, manager)
+
+        # Register user-configured hooks from agent backend config
+        agent_hooks = agent.backend.config.get("hooks")
+        if agent_hooks:
+            manager.register_hooks_from_config(agent_hooks, agent_id=agent_id)
+
+        # Register PathPermissionManagerHook for PRE_TOOL_USE validation.
+        # Native backends like Copilot need MassGen-level path validation.
+        # Claude Code already handles permissions via add_dirs, so skip it.
+        backend_provider = agent.backend.get_provider_name() if hasattr(agent.backend, "get_provider_name") else ""
+        if backend_provider != "claude_code":
+            _fm = getattr(agent.backend, "filesystem_manager", None)
+            if _fm:
+                _ppm = getattr(_fm, "path_permission_manager", None)
+                if _ppm:
+                    from massgen.filesystem_manager import PathPermissionManagerHook
+
+                    ppm_hook = PathPermissionManagerHook(_ppm)
+                    manager.register_global_hook(HookType.PRE_TOOL_USE, ppm_hook)
+                    logger.debug(
+                        f"[Orchestrator] Registered PathPermissionManagerHook (PRE_TOOL_USE) for {agent_id}",
+                    )
+
+        # Create context factory for hooks
+        def context_factory() -> dict[str, Any]:
+            workspace_path = None
+            filesystem_manager = getattr(agent.backend, "filesystem_manager", None)
+            if filesystem_manager and hasattr(filesystem_manager, "get_current_workspace"):
+                try:
+                    workspace_path = str(filesystem_manager.get_current_workspace())
+                except Exception:
+                    workspace_path = None
+            return {
+                "session_id": getattr(orch, "session_id", ""),
+                "orchestrator_id": getattr(orch, "orchestrator_id", ""),
+                "agent_id": agent_id,
+                "workspace_path": workspace_path,
+            }
+
+        # Convert to native format using adapter
+        native_config = adapter.build_native_hooks_config(
+            manager,
+            agent_id=agent_id,
+            context_factory=context_factory,
+        )
+
+        # Set native hooks config on backend
+        agent.backend.set_native_hooks_config(native_config)
+        if hasattr(agent.backend, "set_background_wait_interrupt_provider"):
+
+            async def _wait_interrupt_provider(
+                requested_agent_id: str,
+                *,
+                _agent_id: str = agent_id,
+            ) -> dict[str, Any] | None:
+                target_agent_id = requested_agent_id or _agent_id
+                if hasattr(orch, "cancellation_manager") and orch.cancellation_manager and orch.cancellation_manager.is_cancelled:
+                    return {
+                        "interrupt_reason": "turn_cancelled",
+                        "injected_content": None,
+                    }
+
+                runtime_sections = await orch._collect_no_hook_runtime_fallback_sections(
+                    target_agent_id,
+                )
+                if not runtime_sections:
+                    return None
+                return {
+                    "interrupt_reason": "runtime_injection_available",
+                    "injected_content": "\n".join(runtime_sections),
+                }
+
+            agent.backend.set_background_wait_interrupt_provider(
+                _wait_interrupt_provider,
+            )
+        logger.info(
+            f"[Orchestrator] Set up native hooks for {agent_id}: " f"PreToolUse={len(native_config.get('PreToolUse', []))}, " f"PostToolUse={len(native_config.get('PostToolUse', []))} hooks",
+        )
+
+    def register_round_timeout_hooks(
+        self,
+        agent_id: str,
+        manager: GeneralHookManager,
+    ) -> None:
+        """Register per-round timeout hooks if configured.
+
+        Creates a soft RoundTimeoutPostHook (injects a warning after tool calls)
+        and a hard RoundTimeoutPreHook (blocks non-terminal tools after the grace
+        period), storing both on the agent state for per-round reset.
+        """
+        orch = self._orchestrator
+
+        timeout_config = orch.config.timeout_config
+        initial_timeout = timeout_config.initial_round_timeout_seconds
+        subsequent_timeout = timeout_config.subsequent_round_timeout_seconds
+        grace_seconds = timeout_config.round_timeout_grace_seconds
+
+        # Skip if no round timeouts configured
+        if initial_timeout is None and subsequent_timeout is None:
+            return
+
+        logger.info(
+            f"[Orchestrator] Registering round timeout hooks for {agent_id}: " f"initial={initial_timeout}s, subsequent={subsequent_timeout}s, grace={grace_seconds}s",
+        )
+
+        # Create closures that read from agent state
+        def get_round_start_time() -> float:
+            """Get the current round start time from agent state."""
+            start_time = orch.agent_states[agent_id].round_start_time
+            if start_time is None:
+                # Fallback to current time if not set (shouldn't happen)
+                logger.warning(
+                    f"[Orchestrator] round_start_time is None for {agent_id}, using current time as fallback",
+                )
+                return time.time()
+            return start_time
+
+        def get_agent_round() -> int:
+            """Get the current round number from coordination tracker."""
+            return orch.coordination_tracker.get_agent_round(agent_id)
+
+        # Create shared state for coordinating soft -> hard timeout progression
+        # This ensures hard timeout only fires AFTER soft timeout has been injected
+        timeout_state = RoundTimeoutState()
+
+        # Get two-tier workspace setting from coordination config
+        # Suppressed when write_mode is active (write_mode replaces the old two-tier structure)
+        coordination_config = getattr(orch.config, "coordination_config", None)
+        write_mode = getattr(coordination_config, "write_mode", None) if coordination_config else None
+        use_two_tier_workspace = False
+        if not (write_mode and write_mode != "legacy"):
+            use_two_tier_workspace = bool(
+                getattr(coordination_config, "use_two_tier_workspace", False),
+            )
+
+        # Create soft timeout hook (POST_TOOL_USE - injects warning)
+        post_hook = RoundTimeoutPostHook(
+            name=f"round_timeout_soft_{agent_id}",
+            get_round_start_time=get_round_start_time,
+            get_agent_round=get_agent_round,
+            initial_timeout_seconds=initial_timeout,
+            subsequent_timeout_seconds=subsequent_timeout,
+            grace_seconds=grace_seconds,
+            agent_id=agent_id,
+            shared_state=timeout_state,
+            use_two_tier_workspace=use_two_tier_workspace,
+        )
+
+        # Create hard timeout hook (PRE_TOOL_USE - blocks non-terminal tools)
+        pre_hook = RoundTimeoutPreHook(
+            name=f"round_timeout_hard_{agent_id}",
+            get_round_start_time=get_round_start_time,
+            get_agent_round=get_agent_round,
+            initial_timeout_seconds=initial_timeout,
+            subsequent_timeout_seconds=subsequent_timeout,
+            grace_seconds=grace_seconds,
+            agent_id=agent_id,
+            shared_state=timeout_state,
+        )
+
+        # Register hooks
+        manager.register_global_hook(HookType.POST_TOOL_USE, post_hook)
+        manager.register_global_hook(HookType.PRE_TOOL_USE, pre_hook)
+
+        # Store hook references so we can reset them on new rounds
+        orch.agent_states[agent_id].round_timeout_hooks = (post_hook, pre_hook)
+        # Store the shared state so we can check force_terminate in the orchestrator loop
+        orch.agent_states[agent_id].round_timeout_state = timeout_state
+
+        logger.debug(f"[Orchestrator] Registered round timeout hooks for {agent_id}")
+
     async def build_midstream_injection(
         self,
         agent_id: str,

From 4a10843c04c3365c42eea5deb0d624e53431e1c3 Mon Sep 17 00:00:00 2001
From: ncrispino <ncrispino@wustl.edu>
Date: Fri, 5 Jun 2026 09:17:38 -0700
Subject: [PATCH 3/4] fix+docs: B1 snapshot read-during-write race (versioned
 snapshots) + v0.1.94 release docs

Fixes the read-during-write race that the B1 event-loop offload exposed: the
offloaded peer-context copytree could overlap an owner's in-place rmtree+rebuild
of the same snapshot dir. Snapshots are now immutable and versioned --
save_snapshot (and the interrupted-turn save) publish <base>/.versions/<id>/v<N>
and atomically repoint the <base>/<id> symlink; readers acquire/refcount the
current version for their copy's duration. GC never deletes a pinned or in-flight
version. New SnapshotVersionStore coordinates publish/acquire/release/GC.

Also includes the earlier review cleanups on this branch:
- D2: emit_status was called with an invalid status= kwarg whose TypeError was
  swallowed, so worktree-isolation degradation never surfaced -- now fixed.
- A1: the triplicated _wait_interrupt_provider closure consolidated into one
  _install_wait_interrupt_provider helper (backend-parity drift removed).
- Interrupted-turn save no longer rmtree's the (now symlinked) snapshot path.

All under TDD with red-verified regression tests:
- test_snapshot_version_store.py (concurrent-publish-during-read, concurrent-
  publisher GC, refcount-protects-from-GC, symlink fallback)
- test_snapshot_versioned_save.py (versioned save, interrupted-over-symlink,
  orchestrator-reader pin wiring)
- test_wait_interrupt_provider.py (consolidated provider contract)
- test_concurrency_race_fixes.py (D2 emit signature)

Release documentation for v0.1.94 "Parallelism Hardening" (engineering health):
CHANGELOG, README (Latest Features / Recent Achievements / TOC), ROADMAP,
docs/source/index.rst, architecture module doc, and the docs/announcements/
rotation (archive v0.1.93, rewrite current-release.md, swap github-release file).
Version bumped to 0.1.94. Updated the release-documenter SKILL.md to document the
announcements rotation + version bump (previously undocumented).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 CHANGELOG.md                                  |  27 ++
 README.md                                     |  50 ++--
 README_PYPI.md                                |  50 ++--
 ROADMAP.md                                    |  23 +-
 docs/announcements/archive/v0.1.93.md         |  74 +++++
 docs/announcements/current-release.md         |  50 ++--
 docs/announcements/github-release-v0.1.93.md  |  38 ---
 docs/announcements/github-release-v0.1.94.md  |  35 +++
 .../dev_notes/next_version_eng_health_plan.md |   7 +-
 docs/modules/architecture.md                  |   4 +
 docs/source/index.rst                         |   4 +
 massgen/__init__.py                           |   2 +-
 .../filesystem_manager/_filesystem_manager.py |  58 ++--
 .../_snapshot_version_store.py                | 230 ++++++++++++++++
 massgen/orchestrator.py                       |   3 +-
 .../midstream_injection_hook_installer.py     | 121 +++------
 .../snapshot_manager.py                       |  89 +++++--
 .../massgen-release-documenter/SKILL.md       |  61 +++--
 massgen/tests/test_concurrency_race_fixes.py  |  28 ++
 massgen/tests/test_snapshot_version_store.py  | 213 +++++++++++++++
 massgen/tests/test_snapshot_versioned_save.py | 252 ++++++++++++++++++
 massgen/tests/test_wait_interrupt_provider.py | 103 +++++++
 22 files changed, 1249 insertions(+), 273 deletions(-)
 create mode 100644 docs/announcements/archive/v0.1.93.md
 delete mode 100644 docs/announcements/github-release-v0.1.93.md
 create mode 100644 docs/announcements/github-release-v0.1.94.md
 create mode 100644 massgen/filesystem_manager/_snapshot_version_store.py
 create mode 100644 massgen/tests/test_snapshot_version_store.py
 create mode 100644 massgen/tests/test_snapshot_versioned_save.py
 create mode 100644 massgen/tests/test_wait_interrupt_provider.py

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 7edc865d4..71681ab85 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -7,8 +7,35 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## [Unreleased]
 
+## [0.1.94] - 2026-06-05
+
+### Theme: Parallelism Hardening (Engineering Health)
+
+Strengthen the orchestrator's parallel execution: move blocking snapshot work off the event loop so agents keep streaming concurrently, close the latent concurrency races that the previously-serialized copy had kept hidden, and finish the last refactor blocker. No per-backend functionality changes (parity principle). All items landed under TDD (tests written first, confirmed red, then green) with cost-free simulation (mock backends / real collaborator code, no LLM calls).
+
+### Changed
+- **Immutable, versioned snapshot storage**: each agent's snapshot path `<base>/<agent_id>` is now a symlink to an immutable version directory under `<base>/.versions/<agent_id>/v<N>`. `save_snapshot` (and the interrupted-turn partial save) publish a fresh version and atomically repoint the symlink rather than rewriting in place; the peer-context copy `acquire`s (refcounts) the current version for the duration of its offloaded copy. The symlink is transparent to all other readers, so the on-disk layout consumers see is unchanged. Coordinated by the new `SnapshotVersionStore` (`massgen/filesystem_manager/_snapshot_version_store.py`). On platforms without symlink support it falls back to a direct copy.
+- **Snapshot copy moved off the event loop (B1)**: `FilesystemManager.copy_snapshots_to_temp_workspace` now runs its blocking `rmtree`/`copytree`/scrub on a worker thread via `asyncio.to_thread`, so one agent's snapshot copy no longer stalls every other agent's streaming.
+- **Unified mid-stream injection (A1)**: the two ~150-line `get_injection_content` closures collapsed into a single `MidStreamInjectionHookInstaller.build_midstream_injection(..., native=)`; both hook-setup paths delegate, preserving the `update_context → refresh_checklist` side-effect order for both paths. The triplicated background-wait interrupt provider was likewise consolidated into one `_install_wait_interrupt_provider`.
+
+### Fixed
+- **Snapshot read-during-write race (B1 hardening)**: offloading the snapshot copy (above) removed the implicit event-loop serialization that kept a peer's `copytree` from overlapping an owner's in-place `rmtree`+rebuild of the same directory, which could surface `FileNotFoundError` or a torn snapshot. The versioned-snapshot scheme makes the read source immutable for the copy's duration, eliminating the race (including a concurrent-publisher GC edge case).
+- **R1 — lost peer-answer revision**: the mid-stream injection path marked a peer "seen" by re-reading the live revision count after a yielding `await`, dropping a revision appended during the window. Revision counts captured at selection time are now threaded through `mark_seen_answer_revisions` / `register_injected_answer_updates`.
+- **R2/R3 — lost background-subagent result**: a blind `pop(agent_id)` after the injection `await` discarded results appended during the window; consumption now removes only the consumed subagent ids.
+- **R4 — leaked trace tasks on cleanup**: detached background trace-analyzer tasks are now cancelled before the pending-result flush.
+- **R5 — cancel-without-await teardown**: `cancel_all_subagents` now awaits the cancellations so each task runs its `CancelledError` handler against the live registry before it is cleared.
+- **D2 — worktree-isolation degradation never surfaced**: `_record_round_isolation_degraded` called `emit_status(status=…)`, which is not a valid parameter, so the `TypeError` was silently swallowed and the visible signal never fired; it now calls `emit_status(message=…, level="warning", agent_id=…)`.
+- **D3 — changedoc enrichment made non-fatal** so a post-record failure cannot kill a valid-answer agent.
+- **Interrupted-turn save over a published snapshot**: the partial save did `shutil.rmtree` on the (now symlinked) snapshot path, raising and silently dropping the snapshot; it now publishes a new version through the store.
+
+### Tests
+- New race/regression suites: `test_concurrency_race_fixes.py` (R1–R5, D2/D3), `test_snapshot_version_store.py` and `test_snapshot_versioned_save.py` (versioned snapshots incl. concurrent-publish-during-read and concurrent-publisher GC), `test_snapshot_copy_offload.py` (off-loop copy), `test_midstream_injection_unified.py` (cross-path effect-order equality), and `test_wait_interrupt_provider.py` (consolidated interrupt-provider contract).
+
 ## Recent Releases
 
+**v0.1.94 (June 5, 2026)** - Parallelism Hardening (Engineering Health)
+Strengthens the orchestrator's parallel execution: moves the snapshot copy off the event loop so agents keep streaming concurrently — backed by immutable versioned snapshots that keep the off-loop copy safe — and closes latent concurrency races (lost peer-answer revisions, lost background-subagent results, leaked trace tasks, cancel-without-await teardown). Also unifies the mid-stream injection paths and surfaces worktree-isolation degradation. No per-backend functionality changes.
+
 **v0.1.93 (June 3, 2026)** - CLI Package Decomposition & Pydantic Config Migration
 Splits the monolithic `cli.py` into a focused `massgen/cli/` package, migrates the configuration classes to pydantic dataclasses with `Literal`-typed modes validated at construction, removes ~8.7k lines of dead legacy code, and hardens the test-signal and type-checking tooling (coverage gate, no-assert guard, `uv.lock` enforcement, and an incremental mypy ratchet). Internal-quality release with no runtime behavior changes.
 
diff --git a/README.md b/README.md
index b0c8d39b2..e9f4152ad 100644
--- a/README.md
+++ b/README.md
@@ -69,7 +69,7 @@ This project started with the "threads of thought" and "iterative refinement" id
 <details open>
 <summary><h3>🆕 Latest Features</h3></summary>
 
-- [v0.1.93 Features](#-latest-features-v0193)
+- [v0.1.94 Features](#-latest-features-v0194)
 </details>
 
 <details open>
@@ -122,15 +122,15 @@ This project started with the "threads of thought" and "iterative refinement" id
 <details open>
 <summary><h3>🗺️ Roadmap</h3></summary>
 
-- [Recent Achievements (v0.1.93)](#recent-achievements-v0193)
-- [Previous Achievements (v0.0.3 - v0.1.92)](#previous-achievements-v003---v0192)
+- [Recent Achievements (v0.1.94)](#recent-achievements-v0194)
+- [Previous Achievements (v0.0.3 - v0.1.93)](#previous-achievements-v003---v0193)
 - [Key Future Enhancements](#key-future-enhancements)
   - Bug Fixes & Backend Improvements
   - Advanced Agent Collaboration
   - Expanded Model, Tool & Agent Integrations
   - Improved Performance & Scalability
   - Enhanced Developer Experience
-- [v0.1.93 Roadmap](#v0193-roadmap)
+- [v0.1.95 Roadmap](#v0195-roadmap)
 </details>
 
 <details open>
@@ -155,18 +155,18 @@ This project started with the "threads of thought" and "iterative refinement" id
 
 ---
 
-## 🆕 Latest Features (v0.1.93)
+## 🆕 Latest Features (v0.1.94)
 
-**🎉 Released: June 3, 2026**
+**🎉 Released: June 5, 2026**
 
-**What's New in v0.1.93** (internal-quality release — no runtime behavior changes):
-- **🧩 CLI Package Decomposition** - The monolithic 12k-line `cli.py` is split into a focused `massgen/cli/` package while preserving the public import surface.
-- **🛡️ Pydantic Config Migration** - Configuration classes now validate field types on construction, with `Literal`-typed modes as a single source of truth the validator derives from.
-- **🧹 Dead Code Removal & Tooling** - Removed ~8.7k lines of unreferenced legacy code, fixed the coverage gate, and re-enabled type checking via an incremental mypy ratchet.
+**What's New in v0.1.94** (Parallelism Hardening — engineering-health release, no per-backend functionality changes):
+- **⚡ Snapshot Copy Off the Event Loop** - The peer-context snapshot copy now runs its blocking filesystem work on a worker thread, so one agent's copy no longer stalls every other agent's streaming.
+- **🔒 Immutable, Versioned Snapshots** - Snapshots are published as immutable versions with an atomically-repointed symlink; readers pin the current version, eliminating the read-during-write race the off-loop copy would otherwise expose.
+- **🧵 Concurrency Correctness Fixes** - Lost peer-answer revisions, lost background-subagent results, leaked trace tasks, and a cancel-without-await teardown are all fixed; worktree-isolation degradation is now surfaced.
 
-**Install v0.1.93:**
+**Install v0.1.94:**
 ```bash
-pip install massgen==0.1.93
+pip install massgen==0.1.94
 ```
 
 → [See full release history and examples](massgen/configs/README.md#release-history--examples)
@@ -1241,19 +1241,21 @@ MassGen is currently in its foundational stage, with a focus on parallel, asynch
 
 ⚠️ **Early Stage Notice:** As MassGen is in active development, please expect upcoming breaking architecture changes as we continue to refine and improve the system.
 
-### Recent Achievements (v0.1.93)
+### Recent Achievements (v0.1.94)
 
-**🎉 Released: June 3, 2026**
+**🎉 Released: June 5, 2026**
 
-#### CLI Package Decomposition & Pydantic Config Migration
-- **CLI Package Decomposition**: The monolithic `cli.py` (12,206 lines) was split into an 18-module `massgen/cli/` package with a facade that preserves the public import surface; the ~886-line Textual per-turn handler was extracted into a dependency-injected function
-- **Pydantic Config Migration**: Config classes migrated to `pydantic.dataclasses` (type validation on construction) with `Literal`-typed modes in `massgen/config_modes.py` that the validator derives from — closing a real validator-drift bug
-- **Single-Source Exclusion Lists**: The two hand-duplicated "excluded params" lists now derive from one frozenset, locked by a regression test
-- **Dead Code Removal**: Deleted ~8,700 lines of unreferenced legacy `v1`/`prototype` code that was shipping in the wheel
-- **Tooling**: Fixed the broken coverage gate, enabled a no-assert test guard, enforced `uv.lock` in CI, and re-enabled type checking via an incremental mypy ratchet
-- **Fixes**: Concurrent-run log isolation (MAS-274), a config default regression, and logged (not silent) backend tool-arg parsing
+#### Parallelism Hardening (Engineering Health)
+- **Snapshot Copy Off the Event Loop**: `FilesystemManager.copy_snapshots_to_temp_workspace` now offloads its blocking `rmtree`/`copytree`/scrub to a worker thread via `asyncio.to_thread`, so one agent's snapshot copy no longer serializes every other agent's streaming
+- **Immutable, Versioned Snapshots**: snapshots publish to `<base>/.versions/<id>/v<N>` with an atomically-repointed symlink; readers `acquire`/refcount the current version for the duration of their copy (new `SnapshotVersionStore`), eliminating the read-during-write race the off-loop copy would otherwise expose
+- **Concurrency Correctness**: fixed lost peer-answer revisions (R1), lost background-subagent results (R2/R3), leaked trace tasks on cleanup (R4), and a cancel-without-await teardown (R5)
+- **Worktree-Isolation Degradation Surfaced (D2)**: a swallowed `TypeError` (invalid `emit_status(status=…)` kwarg) had silenced the signal entirely
+- **Unified Mid-Stream Injection (A1)**: the two ~150-line per-backend injection closures collapsed into one shared implementation; the background-wait interrupt provider was deduplicated, removing backend-parity drift
+- **No per-backend functionality changes** — all landed under TDD with cost-free simulation
 
-### Previous Achievements (v0.0.3 - v0.1.92)
+### Previous Achievements (v0.0.3 - v0.1.93)
+
+✅ **CLI Package Decomposition & Pydantic Config Migration (v0.1.93)**: Split the monolithic 12k-line `cli.py` into an 18-module `massgen/cli/` package, migrated config classes to `pydantic.dataclasses` with `Literal`-typed modes the validator derives from, consolidated provider-exclusion lists, removed ~8.7k lines of dead legacy code from the wheel, and hardened the test-signal/type-checking tooling. Internal-quality release, no runtime behavior changes.
 
 ✅ **Orchestrator Collaborator Refactor & Parallel Search MCP (v0.1.92)**: Reduced `orchestrator.py` from 21,599 to 8,574 lines by extracting 49 lazy collaborators, split Textual display helpers into sibling modules, added characterization coverage, and introduced a Parallel Web Search MCP registry entry plus example config.
 
@@ -1584,9 +1586,9 @@ MassGen is currently in its foundational stage, with a focus on parallel, asynch
 
 We welcome community contributions to achieve these goals.
 
-### v0.1.94 Roadmap
+### v0.1.95 Roadmap
 
-Version 0.1.94 picks up the image/video edit work deferred from v0.1.86-v0.1.93 and continues multimodal provider-parity work:
+Version 0.1.95 picks up the image/video edit work deferred from v0.1.86-v0.1.94 and continues multimodal provider-parity work:
 
 #### Planned Features
 - **Image/Video Edit Capabilities** ([#959](https://github.com/massgen/MassGen/issues/959)): Image and video editing across providers with multi-turn editing workflows via continuation IDs
diff --git a/README_PYPI.md b/README_PYPI.md
index c82fc6570..2c31ddbc3 100644
--- a/README_PYPI.md
+++ b/README_PYPI.md
@@ -68,7 +68,7 @@ This project started with the "threads of thought" and "iterative refinement" id
 <details open>
 <summary><h3>🆕 Latest Features</h3></summary>
 
-- [v0.1.93 Features](#-latest-features-v0193)
+- [v0.1.94 Features](#-latest-features-v0194)
 </details>
 
 <details open>
@@ -121,15 +121,15 @@ This project started with the "threads of thought" and "iterative refinement" id
 <details open>
 <summary><h3>🗺️ Roadmap</h3></summary>
 
-- [Recent Achievements (v0.1.93)](#recent-achievements-v0193)
-- [Previous Achievements (v0.0.3 - v0.1.92)](#previous-achievements-v003---v0192)
+- [Recent Achievements (v0.1.94)](#recent-achievements-v0194)
+- [Previous Achievements (v0.0.3 - v0.1.93)](#previous-achievements-v003---v0193)
 - [Key Future Enhancements](#key-future-enhancements)
   - Bug Fixes & Backend Improvements
   - Advanced Agent Collaboration
   - Expanded Model, Tool & Agent Integrations
   - Improved Performance & Scalability
   - Enhanced Developer Experience
-- [v0.1.93 Roadmap](#v0193-roadmap)
+- [v0.1.95 Roadmap](#v0195-roadmap)
 </details>
 
 <details open>
@@ -154,18 +154,18 @@ This project started with the "threads of thought" and "iterative refinement" id
 
 ---
 
-## 🆕 Latest Features (v0.1.93)
+## 🆕 Latest Features (v0.1.94)
 
-**🎉 Released: June 3, 2026**
+**🎉 Released: June 5, 2026**
 
-**What's New in v0.1.93** (internal-quality release — no runtime behavior changes):
-- **🧩 CLI Package Decomposition** - The monolithic 12k-line `cli.py` is split into a focused `massgen/cli/` package while preserving the public import surface.
-- **🛡️ Pydantic Config Migration** - Configuration classes now validate field types on construction, with `Literal`-typed modes as a single source of truth the validator derives from.
-- **🧹 Dead Code Removal & Tooling** - Removed ~8.7k lines of unreferenced legacy code, fixed the coverage gate, and re-enabled type checking via an incremental mypy ratchet.
+**What's New in v0.1.94** (Parallelism Hardening — engineering-health release, no per-backend functionality changes):
+- **⚡ Snapshot Copy Off the Event Loop** - The peer-context snapshot copy now runs its blocking filesystem work on a worker thread, so one agent's copy no longer stalls every other agent's streaming.
+- **🔒 Immutable, Versioned Snapshots** - Snapshots are published as immutable versions with an atomically-repointed symlink; readers pin the current version, eliminating the read-during-write race the off-loop copy would otherwise expose.
+- **🧵 Concurrency Correctness Fixes** - Lost peer-answer revisions, lost background-subagent results, leaked trace tasks, and a cancel-without-await teardown are all fixed; worktree-isolation degradation is now surfaced.
 
-**Install v0.1.93:**
+**Install v0.1.94:**
 ```bash
-pip install massgen==0.1.93
+pip install massgen==0.1.94
 ```
 
 → [See full release history and examples](massgen/configs/README.md#release-history--examples)
@@ -1240,19 +1240,21 @@ MassGen is currently in its foundational stage, with a focus on parallel, asynch
 
 ⚠️ **Early Stage Notice:** As MassGen is in active development, please expect upcoming breaking architecture changes as we continue to refine and improve the system.
 
-### Recent Achievements (v0.1.93)
+### Recent Achievements (v0.1.94)
 
-**🎉 Released: June 3, 2026**
+**🎉 Released: June 5, 2026**
 
-#### CLI Package Decomposition & Pydantic Config Migration
-- **CLI Package Decomposition**: The monolithic `cli.py` (12,206 lines) was split into an 18-module `massgen/cli/` package with a facade that preserves the public import surface; the ~886-line Textual per-turn handler was extracted into a dependency-injected function
-- **Pydantic Config Migration**: Config classes migrated to `pydantic.dataclasses` (type validation on construction) with `Literal`-typed modes in `massgen/config_modes.py` that the validator derives from — closing a real validator-drift bug
-- **Single-Source Exclusion Lists**: The two hand-duplicated "excluded params" lists now derive from one frozenset, locked by a regression test
-- **Dead Code Removal**: Deleted ~8,700 lines of unreferenced legacy `v1`/`prototype` code that was shipping in the wheel
-- **Tooling**: Fixed the broken coverage gate, enabled a no-assert test guard, enforced `uv.lock` in CI, and re-enabled type checking via an incremental mypy ratchet
-- **Fixes**: Concurrent-run log isolation (MAS-274), a config default regression, and logged (not silent) backend tool-arg parsing
+#### Parallelism Hardening (Engineering Health)
+- **Snapshot Copy Off the Event Loop**: `FilesystemManager.copy_snapshots_to_temp_workspace` now offloads its blocking `rmtree`/`copytree`/scrub to a worker thread via `asyncio.to_thread`, so one agent's snapshot copy no longer serializes every other agent's streaming
+- **Immutable, Versioned Snapshots**: snapshots publish to `<base>/.versions/<id>/v<N>` with an atomically-repointed symlink; readers `acquire`/refcount the current version for the duration of their copy (new `SnapshotVersionStore`), eliminating the read-during-write race the off-loop copy would otherwise expose
+- **Concurrency Correctness**: fixed lost peer-answer revisions (R1), lost background-subagent results (R2/R3), leaked trace tasks on cleanup (R4), and a cancel-without-await teardown (R5)
+- **Worktree-Isolation Degradation Surfaced (D2)**: a swallowed `TypeError` (invalid `emit_status(status=…)` kwarg) had silenced the signal entirely
+- **Unified Mid-Stream Injection (A1)**: the two ~150-line per-backend injection closures collapsed into one shared implementation; the background-wait interrupt provider was deduplicated, removing backend-parity drift
+- **No per-backend functionality changes** — all landed under TDD with cost-free simulation
 
-### Previous Achievements (v0.0.3 - v0.1.92)
+### Previous Achievements (v0.0.3 - v0.1.93)
+
+✅ **CLI Package Decomposition & Pydantic Config Migration (v0.1.93)**: Split the monolithic 12k-line `cli.py` into an 18-module `massgen/cli/` package, migrated config classes to `pydantic.dataclasses` with `Literal`-typed modes the validator derives from, consolidated provider-exclusion lists, removed ~8.7k lines of dead legacy code from the wheel, and hardened the test-signal/type-checking tooling. Internal-quality release, no runtime behavior changes.
 
 ✅ **Orchestrator Collaborator Refactor & Parallel Search MCP (v0.1.92)**: Reduced `orchestrator.py` from 21,599 to 8,574 lines by extracting 49 lazy collaborators, split Textual display helpers into sibling modules, added characterization coverage, and introduced a Parallel Web Search MCP registry entry plus example config.
 
@@ -1583,9 +1585,9 @@ MassGen is currently in its foundational stage, with a focus on parallel, asynch
 
 We welcome community contributions to achieve these goals.
 
-### v0.1.94 Roadmap
+### v0.1.95 Roadmap
 
-Version 0.1.94 picks up the image/video edit work deferred from v0.1.86-v0.1.93 and continues multimodal provider-parity work:
+Version 0.1.95 picks up the image/video edit work deferred from v0.1.86-v0.1.94 and continues multimodal provider-parity work:
 
 #### Planned Features
 - **Image/Video Edit Capabilities** ([#959](https://github.com/massgen/MassGen/issues/959)): Image and video editing across providers with multi-turn editing workflows via continuation IDs
diff --git a/ROADMAP.md b/ROADMAP.md
index 682f7f67f..806900fe9 100644
--- a/ROADMAP.md
+++ b/ROADMAP.md
@@ -1,10 +1,10 @@
 # MassGen Roadmap
 
-**Current Version:** v0.1.93
+**Current Version:** v0.1.94
 
 **Release Schedule:** Mondays, Wednesdays, Fridays @ 9am PT
 
-**Last Updated:** June 3, 2026
+**Last Updated:** June 5, 2026
 
 This roadmap outlines MassGen's development priorities for upcoming releases. Each release focuses on specific capabilities with real-world use cases.
 
@@ -42,12 +42,29 @@ Want to contribute or collaborate on a specific track? Reach out to the track ow
 
 | Release | Target | Feature | Owner | Use Case |
 |---------|--------|---------|-------|----------|
-| **v0.1.94** | TBD | Image/Video Edit Capabilities | @ncrispino | Check and support img/video editing capabilities — deferred from v0.1.86-v0.1.93 ([#959](https://github.com/massgen/MassGen/issues/959)) |
+| **v0.1.95** | TBD | Image/Video Edit Capabilities | @ncrispino | Check and support img/video editing capabilities — deferred from v0.1.86-v0.1.94 ([#959](https://github.com/massgen/MassGen/issues/959)) |
 
 *All releases ship on MWF @ 9am PT when ready*
 
 ---
 
+## ✅ v0.1.94 - Parallelism Hardening (Engineering Health) (Completed)
+
+**Released:** June 5, 2026
+
+### Features
+- **Snapshot Copy Off the Event Loop**: `FilesystemManager.copy_snapshots_to_temp_workspace` offloads its blocking `rmtree`/`copytree`/scrub to a worker thread via `asyncio.to_thread`, so one agent's snapshot copy no longer serializes every other agent's streaming
+- **Immutable, Versioned Snapshots**: snapshots publish to `<base>/.versions/<id>/v<N>` with an atomically-repointed symlink; readers `acquire`/refcount the current version for the duration of their copy (new `SnapshotVersionStore`), eliminating the read-during-write race the off-loop copy would otherwise expose
+- **Concurrency Correctness**: fixed lost peer-answer revisions (R1), lost background-subagent results (R2/R3), leaked background trace tasks on cleanup (R4), and a cancel-without-await teardown (R5)
+- **Worktree-Isolation Degradation Surfaced (D2)**: an invalid `emit_status(status=…)` kwarg had its `TypeError` swallowed, silencing the signal entirely
+- **Unified Mid-Stream Injection (A1)**: the two ~150-line per-backend `get_injection_content` closures collapsed into one `build_midstream_injection(..., native=)`; the background-wait interrupt provider was deduplicated, removing backend-parity drift
+
+### Notes
+- Engineering-health release: no per-backend functionality changes (parity principle); all items landed under TDD with cost-free simulation.
+- Image/Video Edit Capabilities ([#959](https://github.com/massgen/MassGen/issues/959)) remain deferred to v0.1.95.
+
+---
+
 ## ✅ v0.1.93 - CLI Package Decomposition & Pydantic Config Migration (Completed)
 
 **Released:** June 3, 2026
diff --git a/docs/announcements/archive/v0.1.93.md b/docs/announcements/archive/v0.1.93.md
new file mode 100644
index 000000000..55dd35f20
--- /dev/null
+++ b/docs/announcements/archive/v0.1.93.md
@@ -0,0 +1,74 @@
+# MassGen v0.1.93 Release Announcement
+
+<!--
+This is the current release announcement. Copy this + feature-highlights.md to LinkedIn/X.
+After posting, update the social links below.
+-->
+
+## Release Summary
+
+We're excited to release MassGen v0.1.93 — CLI Package Decomposition & Pydantic Config Migration! 🚀 This internal-quality release keeps runtime behavior stable while tightening the layers developers touch most: the 12k-line CLI is now a focused `massgen/cli/` package, config dataclasses validate at construction time, provider-exclusion lists share one source of truth, dead legacy code is gone from the wheel, and CI/type-checking catches issues earlier.
+
+## Install
+
+```bash
+pip install massgen==0.1.93
+```
+
+## Links
+
+- **Release notes:** https://github.com/massgen/MassGen/releases/tag/v0.1.93
+- **X post:** [TO BE ADDED AFTER POSTING]
+- **LinkedIn post:** [TO BE ADDED AFTER POSTING]
+
+## Posting Notes
+
+- **Suggested image:** Use a screenshot of the v0.1.93 release notes.
+
+---
+
+## Full Announcement (for LinkedIn)
+
+Copy everything below this line, then append content from `feature-highlights.md`:
+
+---
+
+We're excited to release MassGen v0.1.93 — CLI Package Decomposition & Pydantic Config Migration! 🚀 This internal-quality release keeps runtime behavior stable while tightening the layers developers touch most: the 12k-line CLI is now a focused `massgen/cli/` package, config dataclasses validate at construction time, provider-exclusion lists share one source of truth, dead legacy code is gone from the wheel, and CI/type-checking catches issues earlier.
+
+**Key Improvements:**
+
+🧩 **CLI Package Decomposition**:
+- `massgen/cli.py` was split into an 18-module `massgen/cli/` package
+- The facade keeps `from massgen.cli import ...` and `massgen.cli...` imports working
+- The Textual per-turn handler was extracted into a dependency-injected function
+
+🛡️ **Pydantic Config Validation**:
+- Core config classes now validate field types on construction
+- Mode fields use `Literal` types in `massgen/config_modes.py`
+- `config_validator` derives valid mode sets from those typed definitions instead of maintaining drift-prone duplicates
+
+🔧 **Correctness Fixes**:
+- Concurrent in-process Textual runs keep their own logging/snapshot session
+- `CoordinationConfig.from_dict()` now drops absent `None` values so field defaults apply
+- Response backend tool-argument parsing now logs malformed payloads instead of silently converting them to `{}`
+
+🧪 **Test Signal & Typing**:
+- Coverage config now points at the real package
+- No-assert pytest returns are treated as errors
+- CI enforces `uv.lock` with `uv sync --frozen`
+- An incremental mypy island runs as a blocking pre-commit/CI gate
+
+🧹 **Dead Code Removal**:
+- Removed unreferenced legacy `massgen/v1` and `massgen/prototype` code from the shipped wheel
+
+**Install:**
+
+```bash
+pip install massgen==0.1.93
+```
+
+Release notes: https://github.com/massgen/MassGen/releases/tag/v0.1.93
+
+Feature highlights:
+
+<!-- Paste feature-highlights.md content here -->
diff --git a/docs/announcements/current-release.md b/docs/announcements/current-release.md
index 55dd35f20..3ad62b66e 100644
--- a/docs/announcements/current-release.md
+++ b/docs/announcements/current-release.md
@@ -1,4 +1,4 @@
-# MassGen v0.1.93 Release Announcement
+# MassGen v0.1.94 Release Announcement (Parallelism Hardening)
 
 <!--
 This is the current release announcement. Copy this + feature-highlights.md to LinkedIn/X.
@@ -7,23 +7,23 @@ After posting, update the social links below.
 
 ## Release Summary
 
-We're excited to release MassGen v0.1.93 — CLI Package Decomposition & Pydantic Config Migration! 🚀 This internal-quality release keeps runtime behavior stable while tightening the layers developers touch most: the 12k-line CLI is now a focused `massgen/cli/` package, config dataclasses validate at construction time, provider-exclusion lists share one source of truth, dead legacy code is gone from the wheel, and CI/type-checking catches issues earlier.
+We're excited to release MassGen v0.1.94 — Parallelism Hardening (Engineering Health)! 🚀 This release strengthens the orchestrator's parallel execution: we moved the per-round snapshot copy off the event loop so agents keep streaming concurrently, backed it with an immutable, versioned snapshot store that keeps the off-loop copy safe, and closed several latent concurrency races. No per-backend functionality changes — pure parallelism correctness and reliability.
 
 ## Install
 
 ```bash
-pip install massgen==0.1.93
+pip install massgen==0.1.94
 ```
 
 ## Links
 
-- **Release notes:** https://github.com/massgen/MassGen/releases/tag/v0.1.93
+- **Release notes:** https://github.com/massgen/MassGen/releases/tag/v0.1.94
 - **X post:** [TO BE ADDED AFTER POSTING]
 - **LinkedIn post:** [TO BE ADDED AFTER POSTING]
 
 ## Posting Notes
 
-- **Suggested image:** Use a screenshot of the v0.1.93 release notes.
+- **Suggested image:** Use a screenshot of the v0.1.94 release notes.
 
 ---
 
@@ -33,41 +33,35 @@ Copy everything below this line, then append content from `feature-highlights.md
 
 ---
 
-We're excited to release MassGen v0.1.93 — CLI Package Decomposition & Pydantic Config Migration! 🚀 This internal-quality release keeps runtime behavior stable while tightening the layers developers touch most: the 12k-line CLI is now a focused `massgen/cli/` package, config dataclasses validate at construction time, provider-exclusion lists share one source of truth, dead legacy code is gone from the wheel, and CI/type-checking catches issues earlier.
+We're excited to release MassGen v0.1.94 — Parallelism Hardening (Engineering Health)! 🚀 This release strengthens the orchestrator's parallel execution: it moves blocking snapshot work off the event loop so agents keep streaming concurrently, backs it with immutable versioned snapshots that keep the off-loop copy safe, and closes latent concurrency races. No per-backend functionality changes.
 
 **Key Improvements:**
 
-🧩 **CLI Package Decomposition**:
-- `massgen/cli.py` was split into an 18-module `massgen/cli/` package
-- The facade keeps `from massgen.cli import ...` and `massgen.cli...` imports working
-- The Textual per-turn handler was extracted into a dependency-injected function
+⚡ **Snapshot copy off the event loop**:
+- The peer-context snapshot copy now runs its blocking `rmtree`/`copytree`/scrub on a worker thread via `asyncio.to_thread`
+- One agent's snapshot copy no longer stalls every other agent's streaming
 
-🛡️ **Pydantic Config Validation**:
-- Core config classes now validate field types on construction
-- Mode fields use `Literal` types in `massgen/config_modes.py`
-- `config_validator` derives valid mode sets from those typed definitions instead of maintaining drift-prone duplicates
+🔒 **Immutable, versioned snapshots**:
+- Each agent's snapshot path is now a symlink to an immutable `.versions/<id>/v<N>` directory
+- Writers publish a new version and atomically repoint the symlink; readers pin (refcount) the current version for the duration of their copy
+- Eliminates the read-during-write race the off-loop copy would otherwise expose — no `FileNotFoundError`, no torn snapshots
 
-🔧 **Correctness Fixes**:
-- Concurrent in-process Textual runs keep their own logging/snapshot session
-- `CoordinationConfig.from_dict()` now drops absent `None` values so field defaults apply
-- Response backend tool-argument parsing now logs malformed payloads instead of silently converting them to `{}`
+🧵 **Concurrency correctness fixes**:
+- Lost peer-answer revisions across the injection `await` window — fixed (revision counts captured at selection time)
+- Lost background-subagent results from a blind queue `pop` — fixed (consume only the consumed ids)
+- Leaked background trace tasks on cleanup, and a cancel-without-await teardown — fixed
+- Worktree-isolation degradation is now surfaced (a swallowed `TypeError` previously hid it)
 
-🧪 **Test Signal & Typing**:
-- Coverage config now points at the real package
-- No-assert pytest returns are treated as errors
-- CI enforces `uv.lock` with `uv sync --frozen`
-- An incremental mypy island runs as a blocking pre-commit/CI gate
-
-🧹 **Dead Code Removal**:
-- Removed unreferenced legacy `massgen/v1` and `massgen/prototype` code from the shipped wheel
+🧩 **Unified mid-stream injection**:
+- The two large per-backend injection closures collapsed into one shared implementation; the background-wait interrupt provider was likewise deduplicated, removing backend-parity drift
 
 **Install:**
 
 ```bash
-pip install massgen==0.1.93
+pip install massgen==0.1.94
 ```
 
-Release notes: https://github.com/massgen/MassGen/releases/tag/v0.1.93
+Release notes: https://github.com/massgen/MassGen/releases/tag/v0.1.94
 
 Feature highlights:
 
diff --git a/docs/announcements/github-release-v0.1.93.md b/docs/announcements/github-release-v0.1.93.md
deleted file mode 100644
index 3cb06bad3..000000000
--- a/docs/announcements/github-release-v0.1.93.md
+++ /dev/null
@@ -1,38 +0,0 @@
-# 🚀 Release Highlights — v0.1.93 (2026-06-03)
-
-v0.1.93 is an internal-quality release: no intended runtime behavior changes, but a smaller CLI surface, typed config validation, cleaner package contents, and stronger test/type-checking gates.
-
-### 🧩 CLI Package Decomposition
-- `massgen/cli.py` was split into an 18-module `massgen/cli/` package
-- The public import surface is preserved through the package facade
-- The Textual per-turn handler was extracted into a dependency-injected function
-- CLI helper and run-loop characterization tests cover the new seams
-
-### 🛡️ Pydantic Config Migration
-- `AgentConfig`, `CoordinationConfig`, `TimeoutConfig`, `StepModeConfig`, and related nested config classes now validate field types on construction
-- Mode fields use `Literal` definitions in `massgen/config_modes.py`
-- `config_validator` derives valid mode sets from the typed definitions, reducing validator drift
-- `pydantic>=2.0` is now a declared dependency
-
-### 🔧 Correctness Fixes
-- Textual concurrent-run logging now preserves per-run session context
-- `CoordinationConfig.from_dict()` no longer lets absent YAML keys override defaults with `None`
-- Response backend tool-argument parsing now logs malformed payloads instead of silently dropping them
-- Backend/API parameter exclusion lists now derive from a single source
-
-### 🧪 Test Signal & Typing
-- Coverage configuration now points at the real package
-- `pytest.PytestReturnNotNoneWarning` is treated as an error
-- CI enforces `uv.lock` via `uv sync --frozen`
-- `scripts/mypy_island.sh` adds a blocking incremental mypy gate
-- All bundled configs were validated after the migration
-
-### 🧹 Dead Code Removal
-- Removed unreferenced legacy `massgen/v1` and `massgen/prototype` code from the wheel
-
----
-
-### 📖 Install
-```bash
-pip install massgen==0.1.93
-```
diff --git a/docs/announcements/github-release-v0.1.94.md b/docs/announcements/github-release-v0.1.94.md
new file mode 100644
index 000000000..9d392da27
--- /dev/null
+++ b/docs/announcements/github-release-v0.1.94.md
@@ -0,0 +1,35 @@
+# 🚀 Release Highlights — v0.1.94 (2026-06-05)
+
+v0.1.94 — Parallelism Hardening (Engineering Health) — strengthens the orchestrator's parallel execution. It moves the snapshot copy off the event loop so agents keep streaming concurrently, backs it with immutable versioned snapshots that keep the off-loop copy safe, and closes latent concurrency races. No per-backend functionality changes (parity principle).
+
+### ⚡ Snapshot Copy Off the Event Loop
+- `FilesystemManager.copy_snapshots_to_temp_workspace` now runs its blocking `rmtree`/`copytree`/scrub on a worker thread via `asyncio.to_thread`
+- One agent's snapshot copy no longer stalls every other agent's streaming
+
+### 🔒 Immutable, Versioned Snapshots
+- Each agent's snapshot path `<base>/<agent_id>` is now a symlink to an immutable `<base>/.versions/<agent_id>/v<N>` directory
+- `save_snapshot` (and the interrupted-turn partial save) publish a fresh version and atomically repoint the symlink instead of rewriting in place
+- The peer-context copy `acquire`s (refcounts) the current version for the duration of its copy; GC never deletes a pinned or in-flight version
+- Eliminates the read-during-write race the off-loop copy would otherwise expose — coordinated by the new `SnapshotVersionStore`
+
+### 🧵 Concurrency Correctness Fixes
+- **R1** — lost peer-answer revision across the injection `await` window (counts now captured at selection time)
+- **R2/R3** — lost background-subagent result from a blind queue `pop` (consume only the consumed ids)
+- **R4** — leaked background trace-analyzer tasks on cleanup (cancelled before the flush)
+- **R5** — cancel-without-await teardown (`cancel_all_subagents` now awaits cancellations against the live registry)
+- **D2** — worktree-isolation degradation never surfaced because `emit_status` was called with an invalid `status=` kwarg whose `TypeError` was swallowed
+- **D3** — changedoc enrichment made non-fatal
+
+### 🧩 Unified Mid-Stream Injection
+- The two ~150-line per-backend `get_injection_content` closures collapsed into one `build_midstream_injection(..., native=)`, preserving the `update_context → refresh_checklist` side-effect order on both paths
+- The triplicated background-wait interrupt provider consolidated into one helper
+
+### 🧪 Tests
+- New race/regression suites driven under TDD with cost-free simulation: `test_concurrency_race_fixes.py`, `test_snapshot_version_store.py`, `test_snapshot_versioned_save.py`, `test_snapshot_copy_offload.py`, `test_midstream_injection_unified.py`, `test_wait_interrupt_provider.py`
+
+---
+
+### 📖 Install
+```bash
+pip install massgen==0.1.94
+```
diff --git a/docs/dev_notes/next_version_eng_health_plan.md b/docs/dev_notes/next_version_eng_health_plan.md
index 8ef1609d8..bb299ba50 100644
--- a/docs/dev_notes/next_version_eng_health_plan.md
+++ b/docs/dev_notes/next_version_eng_health_plan.md
@@ -1,6 +1,6 @@
 # Next Version — Engineering Health Plan
 
-**Theme: "Real Parallelism"** — make the orchestrator's N-agents-in-parallel promise actually hold, fix the correctness races that the current (accidental) serialization masks, and finish the one remaining refactor blocker. No per-backend functionality changes (parity principle).
+**Theme: Parallelism Hardening** — strengthen the orchestrator's parallel execution: move blocking work off the event loop so agents keep streaming concurrently, close the latent concurrency races that the currently-serialized copy keeps hidden, and finish the one remaining refactor blocker. No per-backend functionality changes (parity principle).
 
 **Date:** 2026-06-04
 
@@ -18,9 +18,10 @@ stashed.
 | R2/R3 lost subagent result (blind pop) | ✅ done | same (R2 ×5) |
 | R4 leaked trace tasks on cleanup | ✅ done | same (R4 ×2) |
 | R5 cancel-without-await teardown | ✅ done | same (R5 ×1) |
-| D2 surface worktree-isolation degradation | ✅ done | same (D2 ×3) |
+| D2 surface worktree-isolation degradation | ✅ done + **emit bug fixed** | same (D2 ×3) + `test_d2_emits_status_with_valid_signature` |
 | D3 changedoc enrichment non-fatal | ✅ done (scoped to changedoc) | same (D3 ×2) |
-| B1 snapshot copy off the event loop | ✅ done | `test_snapshot_copy_offload.py` (×5) |
+| B1 snapshot copy off the event loop | ✅ done + **race hardened** | `test_snapshot_copy_offload.py` (×5) |
+| **B1-race shared-source fix** | ✅ **done** — a post-merge review caught that B1's `to_thread` offload removed the implicit event-loop serialization between the peer-context `copytree` (now on a worker thread) and `save_snapshot`'s in-place `rmtree`+rebuild of the **same** `<base>/<agent_id>` dir (the offload's "disjoint paths" safety claim only covered copy *destinations*, not the shared *source*). Fix: snapshots are now **immutable, versioned** — `save_snapshot` publishes `<base>/.versions/<agent_id>/v<N>` and atomically repoints the `<base>/<agent_id>` symlink; the peer-context reader `acquire`s (refcounts) the current version so a concurrent republish can't delete/mutate it mid-copy. See `SnapshotVersionStore`. | `test_snapshot_version_store.py` (×9, incl. concurrent-republish-during-read), `test_snapshot_versioned_save.py` (×3) |
 | C2 eager debug f-strings | ✅ done (loguru brace-style; NOT `isEnabledFor` — logger is loguru) | `test_answer_normalizer_debug_guard.py` (×3) |
 | E1 assertion-free context test | ✅ done (assertions verified vs real output) | `test_message_context_building.py` (×4) |
 | E2 assertion-free grok test | ✅ done (offline unit + live_api skips) | `test_grok_backend.py` |
diff --git a/docs/modules/architecture.md b/docs/modules/architecture.md
index 35f0a57c2..03533f315 100644
--- a/docs/modules/architecture.md
+++ b/docs/modules/architecture.md
@@ -105,6 +105,10 @@ Two simultaneous `uv run massgen` processes are isolated by:
 1. **Session registry locking**: `~/.massgen/sessions.json` uses `fcntl.flock` (POSIX) for atomic read-modify-write under exclusive lock (`massgen/session/_registry.py`).
 2. **Snapshot path scoping**: `snapshot_storage` from YAML is automatically scoped by the log session root name (microsecond timestamp) via `_scope_snapshot_storage()` in `cli.py`, producing paths like `.massgen/snapshots/log_20260301_XXX/agent_a/`.
 
+**Immutable versioned snapshots.** Each agent's public snapshot path `<base>/<agent_id>` is a **symlink** to an immutable version directory under `<base>/.versions/<agent_id>/v<N>`. `save_snapshot` publishes a fresh `v<N>` and atomically repoints the symlink; it never rewrites a published version in place. The peer-context copy (`SnapshotManager.copy_all_snapshots_to_temp_workspace`) `acquire`s (refcounts) the current version before its offloaded `copytree`, so a concurrent `save_snapshot` republish on another agent cannot delete or mutate the source mid-copy. This is the fix for the B1 read-during-write race introduced when the copy was offloaded to `asyncio.to_thread`. Coordinated by `SnapshotVersionStore` (`massgen/filesystem_manager/_snapshot_version_store.py`); the symlink is transparent to all other readers, so the on-disk layout consumers see is unchanged. On platforms without symlink support the store falls back to a direct copy (no race protection). The interrupted-turn partial save also publishes through the store (it must not `rmtree` the symlink).
+
+**Caveat — late writers.** `save_snapshot` and the interrupted-turn save are the only full publishers. A few *additive* writers still write **through** the symlink into the current version after it's published: execution-trace save (`execution_trace.md`) and changedoc applied-context sync. These run in low-concurrency post-publish / final-presentation phases, so they don't meaningfully race the offloaded peer copy, but they do mutate a "current" version in place — if either is ever moved into a hot, concurrent path it should publish a new version instead. The immutability guarantee that the race fix relies on is precise: *a version is never deleted or rebuilt while a reader holds it*; it does not currently forbid these additive in-place appends to the newest version.
+
 ### Public Accessors (replace private globals)
 
 | Old | New |
diff --git a/docs/source/index.rst b/docs/source/index.rst
index 5413b7dc8..a08cdc6a5 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -209,6 +209,10 @@ Key Features
 Recent Releases
 ---------------
 
+**v0.1.94 (June 5, 2026)** - Parallelism Hardening (Engineering Health)
+
+Strengthens the orchestrator's parallel execution: moves the snapshot copy off the event loop so agents keep streaming concurrently — backed by immutable versioned snapshots that keep the off-loop copy safe — and closes latent concurrency races (lost peer-answer revisions, lost background-subagent results, leaked trace tasks, cancel-without-await teardown). Also unifies the mid-stream injection paths and surfaces worktree-isolation degradation. No per-backend functionality changes.
+
 **v0.1.93 (June 3, 2026)** - CLI Package Decomposition & Pydantic Config Migration
 
 Splits the monolithic ``cli.py`` into a focused ``massgen/cli/`` package, migrates the configuration classes to pydantic dataclasses with ``Literal``-typed modes validated at construction, removes ~8.7k lines of dead legacy code, and hardens the test-signal and type-checking tooling (coverage gate, no-assert guard, ``uv.lock`` enforcement, and an incremental mypy ratchet). Internal-quality release with no runtime behavior changes.
diff --git a/massgen/__init__.py b/massgen/__init__.py
index 9ada731c5..d9289bf7c 100644
--- a/massgen/__init__.py
+++ b/massgen/__init__.py
@@ -86,7 +86,7 @@
 from .message_templates import MessageTemplates, get_templates
 from .orchestrator import Orchestrator, create_orchestrator
 
-__version__ = "0.1.93"
+__version__ = "0.1.94"
 __author__ = "MassGen Contributors"
 
 
diff --git a/massgen/filesystem_manager/_filesystem_manager.py b/massgen/filesystem_manager/_filesystem_manager.py
index c0637286a..485ed93fe 100644
--- a/massgen/filesystem_manager/_filesystem_manager.py
+++ b/massgen/filesystem_manager/_filesystem_manager.py
@@ -31,6 +31,7 @@
 from ._base import Permission
 from ._constants import FRAMEWORK_MCPS
 from ._path_permission_manager import PathPermissionManager
+from ._snapshot_version_store import SnapshotVersionStore
 
 
 def _remove_readonly(func, path, _exc_unused):
@@ -1958,30 +1959,39 @@ async def save_snapshot(self, timestamp: str | None = None, is_final: bool = Fal
                 elif not workspace_has_content and snapshot_storage_has_content:
                     logger.info(f"[FilesystemManager] Skipping snapshot_storage update - workspace is empty but snapshot_storage has content ({self.snapshot_storage})")
                 else:
-                    # Normal case: overwrite with current workspace
-                    if self.snapshot_storage.exists():
-                        _safe_rmtree(self.snapshot_storage)
-                    self.snapshot_storage.mkdir(parents=True, exist_ok=True)
-
-                    items_copied = 0
-                    for item in source_path.iterdir():
-                        if item.is_symlink():
-                            logger.debug(f"[FilesystemManager.save_snapshot] Skipping symlink: {item}")
-                            continue
-                        if item.is_file():
-                            shutil.copy2(item, self.snapshot_storage / item.name)
-                        elif item.is_dir():
-                            # Use symlinks=True to copy symlinks as symlinks, not follow them
-                            # Use ignore_dangling_symlinks=True to handle broken symlinks in subdirectories (e.g., from subagent workspaces)
-                            shutil.copytree(
-                                item,
-                                self.snapshot_storage / item.name,
-                                symlinks=True,
-                                ignore_dangling_symlinks=True,
-                            )
-                        items_copied += 1
-
-                    logger.info(f"[FilesystemManager] Saved snapshot with {items_copied} items to {self.snapshot_storage}")
+                    # Normal case: publish the current workspace as a new
+                    # IMMUTABLE snapshot version and atomically repoint
+                    # <base>/<agent_id> at it. A peer copying the previous version
+                    # keeps a consistent, never-deleted-underneath view -- this is
+                    # the B1 race fix (see SnapshotVersionStore). The old in-place
+                    # rmtree+rebuild raced the offloaded peer-context copytree.
+                    items_copied = {"n": 0}
+
+                    def _populate_snapshot_version(version_dir: Path) -> None:
+                        for item in source_path.iterdir():
+                            if item.is_symlink():
+                                logger.debug(f"[FilesystemManager.save_snapshot] Skipping symlink: {item}")
+                                continue
+                            if item.is_file():
+                                shutil.copy2(item, version_dir / item.name)
+                            elif item.is_dir():
+                                # Use symlinks=True to copy symlinks as symlinks, not follow them
+                                # Use ignore_dangling_symlinks=True to handle broken symlinks in subdirectories (e.g., from subagent workspaces)
+                                shutil.copytree(
+                                    item,
+                                    version_dir / item.name,
+                                    symlinks=True,
+                                    ignore_dangling_symlinks=True,
+                                )
+                            items_copied["n"] += 1
+
+                    # Key on the snapshot_storage basename (== agent_id in
+                    # production, where snapshot_storage is <base>/<agent_id>) so
+                    # the published symlink lands exactly at self.snapshot_storage.
+                    store = SnapshotVersionStore.for_base(self.snapshot_storage.parent)
+                    store.publish_version(self.snapshot_storage.name, _populate_snapshot_version)
+
+                    logger.info(f"[FilesystemManager] Saved snapshot with {items_copied['n']} items to {self.snapshot_storage}")
 
             # --- 2. Save to log directories ---
             log_session_dir = get_log_session_dir()
diff --git a/massgen/filesystem_manager/_snapshot_version_store.py b/massgen/filesystem_manager/_snapshot_version_store.py
new file mode 100644
index 000000000..73a090319
--- /dev/null
+++ b/massgen/filesystem_manager/_snapshot_version_store.py
@@ -0,0 +1,230 @@
+"""Immutable, versioned snapshot publishing for agent snapshot storage.
+
+Background (B1 race)
+--------------------
+Agent snapshots live at ``<base>/<agent_id>``. Historically ``save_snapshot``
+rewrote that directory *in place* (``rmtree`` + rebuild), and the peer-context
+copy (``copy_snapshots_to_temp_workspace``) read the very same directory. Once
+the copy was offloaded to a worker thread (``asyncio.to_thread``), the implicit
+single-threaded serialization between writer and reader was lost: agent A's copy
+thread could ``copytree`` ``<base>/B`` while agent B's ``save_snapshot`` deleted
+and rebuilt it on the event loop, yielding ``FileNotFoundError`` or a torn copy.
+
+Fix
+---
+Writes publish a brand-new **immutable** version directory under
+``<base>/.versions/<agent_id>/v<N>`` and then atomically repoint the public
+path ``<base>/<agent_id>`` (now a symlink) at it. A published version is never
+mutated or deleted while a reader may still be using it:
+
+* Readers ``acquire`` the *current* version under a lock, which resolves the
+  symlink to a concrete ``v<N>`` directory and increments a refcount. They copy
+  from that concrete directory (immune to any subsequent republish) and
+  ``release`` it when done.
+* ``publish_version`` repoints the symlink and runs GC *under the same lock*, so
+  a reader either sees the old version (and pins it, preventing GC) or the new
+  one — never a half-deleted directory. GC skips any version with a non-zero
+  refcount.
+
+The public ``<base>/<agent_id>`` path remains a normal-looking directory to every
+other reader (symlinks are transparent), so the on-disk layout consumers depend
+on is unchanged.
+
+Platform note: if symlinks are unavailable (e.g. unprivileged Windows), the
+store falls back to copying content directly into ``<base>/<agent_id>`` — losing
+the race protection but preserving functionality.
+"""
+
+from __future__ import annotations
+
+import os
+import shutil
+import threading
+import uuid
+from collections.abc import Callable
+from pathlib import Path
+
+from massgen.logger_config import logger
+
+_VERSIONS_DIRNAME = ".versions"
+
+
+def _safe_rmtree(path: Path) -> None:
+    try:
+        if path.is_symlink():
+            path.unlink(missing_ok=True)
+        elif path.exists():
+            shutil.rmtree(path, ignore_errors=True)
+    except Exception as e:  # pragma: no cover - defensive
+        logger.debug(f"[SnapshotVersionStore] rmtree failed for {path}: {e}")
+
+
+class SnapshotVersionStore:
+    """Per-base registry coordinating immutable snapshot versions.
+
+    One instance exists per snapshot-storage *base* path (shared across all
+    agents' ``FilesystemManager``s and the orchestrator's ``SnapshotManager``,
+    which both reach it via :meth:`for_base`).
+    """
+
+    _instances: dict[str, SnapshotVersionStore] = {}
+    _instances_lock = threading.Lock()
+
+    def __init__(self, base: Path) -> None:
+        self._base = Path(base)
+        self._versions_root = self._base / _VERSIONS_DIRNAME
+        self._lock = threading.RLock()
+        self._refcounts: dict[str, int] = {}
+        self._next_version: dict[str, int] = {}
+        # Versions reserved by an in-flight publish but not yet published. GC must
+        # never delete these: a concurrent publisher for the same name may be
+        # mid-``populate`` (which runs outside the lock), and that version has no
+        # refcount yet because no reader can acquire an unpublished version.
+        self._inflight: set[str] = set()
+
+    # -- instance lookup ------------------------------------------------------
+    @classmethod
+    def for_base(cls, base: os.PathLike | str) -> SnapshotVersionStore:
+        key = str(Path(base))
+        with cls._instances_lock:
+            inst = cls._instances.get(key)
+            if inst is None:
+                inst = cls(Path(base))
+                cls._instances[key] = inst
+            return inst
+
+    @classmethod
+    def reset_all_for_tests(cls) -> None:
+        with cls._instances_lock:
+            cls._instances.clear()
+
+    # -- paths ----------------------------------------------------------------
+    def _agent_versions_dir(self, agent_id: str) -> Path:
+        return self._versions_root / agent_id
+
+    def _link_path(self, agent_id: str) -> Path:
+        return self._base / agent_id
+
+    # -- publish --------------------------------------------------------------
+    def _reserve_next_version(self, agent_id: str, vroot: Path) -> int:
+        """Reserve the next version number for ``agent_id`` (lock held)."""
+        n = self._next_version.get(agent_id)
+        if n is None:
+            existing = 0
+            if vroot.exists():
+                for child in vroot.iterdir():
+                    name = child.name
+                    if child.is_dir() and name.startswith("v") and name[1:].isdigit():
+                        existing = max(existing, int(name[1:]))
+            n = existing + 1
+        self._next_version[agent_id] = n + 1
+        return n
+
+    def publish_version(self, agent_id: str, populate: Callable[[Path], None]) -> Path | None:
+        """Build a new immutable version, then atomically publish + GC.
+
+        ``populate(version_dir)`` must fully write the snapshot content into the
+        (empty) ``version_dir`` it is given. It runs *outside* the lock since it
+        is the slow, blocking work; the version directory is freshly reserved and
+        not yet referenced by anything, so building it is race-free.
+        """
+        vroot = self._agent_versions_dir(agent_id)
+        vroot.mkdir(parents=True, exist_ok=True)
+
+        with self._lock:
+            n = self._reserve_next_version(agent_id, vroot)
+            version_dir = vroot / f"v{n}"
+            version_key = str(Path(os.path.realpath(version_dir)))
+            # Mark in-flight so a concurrent same-name publish's GC won't delete
+            # this directory while we populate it outside the lock.
+            self._inflight.add(version_key)
+
+        try:
+            # Build content outside the lock (slow, but the dir is private/unreferenced).
+            if version_dir.exists():
+                _safe_rmtree(version_dir)
+            version_dir.mkdir(parents=True, exist_ok=True)
+            populate(version_dir)
+
+            # Publish + GC atomically w.r.t. acquire().
+            with self._lock:
+                self._repoint_symlink(agent_id, version_dir)
+                self._inflight.discard(version_key)
+                self._gc(agent_id, keep=version_dir)
+        except Exception:
+            with self._lock:
+                self._inflight.discard(version_key)
+            raise
+        return version_dir
+
+    def _repoint_symlink(self, agent_id: str, version_dir: Path) -> None:
+        link = self._link_path(agent_id)
+        target = os.path.relpath(version_dir, self._base)
+        tmp = self._base / f".{agent_id}.lnk.{uuid.uuid4().hex}"
+        try:
+            os.symlink(target, tmp)
+        except OSError as e:
+            # No symlink support: fall back to a direct (unprotected) copy.
+            logger.warning(
+                f"[SnapshotVersionStore] symlink unsupported ({e}); falling back to direct copy " f"for {link} (snapshot race protection disabled on this platform)",
+            )
+            _safe_rmtree(link)
+            shutil.copytree(version_dir, link, symlinks=True, ignore_dangling_symlinks=True)
+            return
+
+        # os.replace cannot atomically replace a non-empty real directory; if the
+        # public path is a legacy real dir (first publish), remove it first. No
+        # reader can be mid-copy of it because no version had been published yet.
+        if link.exists() and not link.is_symlink() and link.is_dir():
+            _safe_rmtree(link)
+        try:
+            os.replace(tmp, link)
+        except OSError as e:  # pragma: no cover - defensive
+            logger.error(f"[SnapshotVersionStore] failed to repoint symlink {link}: {e}")
+            tmp.unlink(missing_ok=True)
+
+    # -- acquire / release ----------------------------------------------------
+    def acquire(self, agent_id: str) -> Path | None:
+        """Pin and return the concrete current version dir for ``agent_id``.
+
+        Returns ``None`` if no snapshot exists yet. The returned directory will
+        not be GC'd until a matching :meth:`release`. Resolution + refcount are
+        atomic w.r.t. :meth:`publish_version`'s repoint+GC.
+        """
+        with self._lock:
+            link = self._link_path(agent_id)
+            if not link.exists():  # follows symlink; dangling -> False
+                return None
+            concrete = Path(os.path.realpath(link))
+            key = str(concrete)
+            self._refcounts[key] = self._refcounts.get(key, 0) + 1
+            return concrete
+
+    def release(self, version_dir: Path | None) -> None:
+        if version_dir is None:
+            return
+        with self._lock:
+            key = str(Path(version_dir))
+            count = self._refcounts.get(key, 0)
+            if count <= 1:
+                self._refcounts.pop(key, None)
+            else:
+                self._refcounts[key] = count - 1
+
+    # -- gc -------------------------------------------------------------------
+    def _gc(self, agent_id: str, keep: Path) -> None:
+        vroot = self._agent_versions_dir(agent_id)
+        if not vroot.exists():
+            return
+        keep_key = str(Path(os.path.realpath(keep)))
+        for child in vroot.iterdir():
+            if not child.is_dir():
+                continue
+            child_key = str(Path(os.path.realpath(child)))
+            if child_key == keep_key:
+                continue
+            if child_key in self._inflight:
+                continue
+            if self._refcounts.get(child_key, 0) > 0:
+                continue
+            _safe_rmtree(child)
diff --git a/massgen/orchestrator.py b/massgen/orchestrator.py
index 8785f30f9..1668f9b46 100644
--- a/massgen/orchestrator.py
+++ b/massgen/orchestrator.py
@@ -4592,8 +4592,9 @@ def _record_round_isolation_degraded(self, agent_id: str, error: BaseException)
             emitter = get_event_emitter()
             if emitter is not None and hasattr(emitter, "emit_status"):
                 emitter.emit_status(
+                    message=f"round isolation degraded: {error_text}",
+                    level="warning",
                     agent_id=agent_id,
-                    status=f"round isolation degraded: {error_text}",
                 )
         except Exception:
             logger.debug("[Orchestrator] Unable to emit round-isolation degraded status", exc_info=True)
diff --git a/massgen/orchestrator_collaborators/midstream_injection_hook_installer.py b/massgen/orchestrator_collaborators/midstream_injection_hook_installer.py
index 1a6d722f4..28f09c4dd 100644
--- a/massgen/orchestrator_collaborators/midstream_injection_hook_installer.py
+++ b/massgen/orchestrator_collaborators/midstream_injection_hook_installer.py
@@ -158,6 +158,43 @@ def compute_plan_progress_stats(self, workspace_path: str) -> dict[str, Any] | N
             logger.debug(f"[Orchestrator] Could not compute plan progress: {e}")
             return None
 
+    def _install_wait_interrupt_provider(self, agent: Any, agent_id: str) -> None:
+        """Register the background-wait interrupt provider on the backend.
+
+        Shared by every hook-setup path (GeneralHookManager, Codex MCP, native)
+        so the cancellation / runtime-fallback contract cannot drift across
+        backends. No-op when the backend does not support the hook.
+        """
+        orch = self._orchestrator
+        if not hasattr(agent.backend, "set_background_wait_interrupt_provider"):
+            return
+
+        async def _wait_interrupt_provider(
+            requested_agent_id: str,
+            *,
+            _agent_id: str = agent_id,
+        ) -> dict[str, Any] | None:
+            target_agent_id = requested_agent_id or _agent_id
+            if hasattr(orch, "cancellation_manager") and orch.cancellation_manager and orch.cancellation_manager.is_cancelled:
+                return {
+                    "interrupt_reason": "turn_cancelled",
+                    "injected_content": None,
+                }
+
+            runtime_sections = await orch._collect_no_hook_runtime_fallback_sections(
+                target_agent_id,
+            )
+            if not runtime_sections:
+                return None
+            return {
+                "interrupt_reason": "runtime_injection_available",
+                "injected_content": "\n".join(runtime_sections),
+            }
+
+        agent.backend.set_background_wait_interrupt_provider(
+            _wait_interrupt_provider,
+        )
+
     def setup_hook_manager_for_agent(
         self,
         agent_id: str,
@@ -283,33 +320,7 @@ def _make_call_tool(aid: str):
 
         # Set manager on backend
         agent.backend.set_general_hook_manager(manager)
-        if hasattr(agent.backend, "set_background_wait_interrupt_provider"):
-
-            async def _wait_interrupt_provider(
-                requested_agent_id: str,
-                *,
-                _agent_id: str = agent_id,
-            ) -> dict[str, Any] | None:
-                target_agent_id = requested_agent_id or _agent_id
-                if hasattr(orch, "cancellation_manager") and orch.cancellation_manager and orch.cancellation_manager.is_cancelled:
-                    return {
-                        "interrupt_reason": "turn_cancelled",
-                        "injected_content": None,
-                    }
-
-                runtime_sections = await orch._collect_no_hook_runtime_fallback_sections(
-                    target_agent_id,
-                )
-                if not runtime_sections:
-                    return None
-                return {
-                    "interrupt_reason": "runtime_injection_available",
-                    "injected_content": "\n".join(runtime_sections),
-                }
-
-            agent.backend.set_background_wait_interrupt_provider(
-                _wait_interrupt_provider,
-            )
+        self._install_wait_interrupt_provider(agent, agent_id)
         logger.debug(
             f"[Orchestrator] Set up hook manager for {agent_id} with mid-stream and reminder hooks",
         )
@@ -340,33 +351,7 @@ def setup_codex_mcp_hooks(
         }
 
         # Set up the background wait interrupt provider (reuse existing pattern)
-        if hasattr(agent.backend, "set_background_wait_interrupt_provider"):
-
-            async def _wait_interrupt_provider(
-                requested_agent_id: str,
-                *,
-                _agent_id: str = agent_id,
-            ) -> dict[str, Any] | None:
-                target_agent_id = requested_agent_id or _agent_id
-                if hasattr(orch, "cancellation_manager") and orch.cancellation_manager and orch.cancellation_manager.is_cancelled:
-                    return {
-                        "interrupt_reason": "turn_cancelled",
-                        "injected_content": None,
-                    }
-
-                runtime_sections = await orch._collect_no_hook_runtime_fallback_sections(
-                    target_agent_id,
-                )
-                if not runtime_sections:
-                    return None
-                return {
-                    "interrupt_reason": "runtime_injection_available",
-                    "injected_content": "\n".join(runtime_sections),
-                }
-
-            agent.backend.set_background_wait_interrupt_provider(
-                _wait_interrupt_provider,
-            )
+        self._install_wait_interrupt_provider(agent, agent_id)
 
         logger.info(
             "[Orchestrator] Set up MCP server-level hook delivery for %s",
@@ -557,33 +542,7 @@ def context_factory() -> dict[str, Any]:
 
         # Set native hooks config on backend
         agent.backend.set_native_hooks_config(native_config)
-        if hasattr(agent.backend, "set_background_wait_interrupt_provider"):
-
-            async def _wait_interrupt_provider(
-                requested_agent_id: str,
-                *,
-                _agent_id: str = agent_id,
-            ) -> dict[str, Any] | None:
-                target_agent_id = requested_agent_id or _agent_id
-                if hasattr(orch, "cancellation_manager") and orch.cancellation_manager and orch.cancellation_manager.is_cancelled:
-                    return {
-                        "interrupt_reason": "turn_cancelled",
-                        "injected_content": None,
-                    }
-
-                runtime_sections = await orch._collect_no_hook_runtime_fallback_sections(
-                    target_agent_id,
-                )
-                if not runtime_sections:
-                    return None
-                return {
-                    "interrupt_reason": "runtime_injection_available",
-                    "injected_content": "\n".join(runtime_sections),
-                }
-
-            agent.backend.set_background_wait_interrupt_provider(
-                _wait_interrupt_provider,
-            )
+        self._install_wait_interrupt_provider(agent, agent_id)
         logger.info(
             f"[Orchestrator] Set up native hooks for {agent_id}: " f"PreToolUse={len(native_config.get('PreToolUse', []))}, " f"PostToolUse={len(native_config.get('PostToolUse', []))} hooks",
         )
diff --git a/massgen/orchestrator_collaborators/snapshot_manager.py b/massgen/orchestrator_collaborators/snapshot_manager.py
index e1111c4cf..fd46860ab 100644
--- a/massgen/orchestrator_collaborators/snapshot_manager.py
+++ b/massgen/orchestrator_collaborators/snapshot_manager.py
@@ -36,6 +36,7 @@
 from pathlib import Path
 from typing import TYPE_CHECKING, Any
 
+from massgen.filesystem_manager._snapshot_version_store import SnapshotVersionStore
 from massgen.logger_config import get_log_session_dir, logger
 
 if TYPE_CHECKING:
@@ -157,12 +158,23 @@ async def copy_all_snapshots_to_temp_workspace(
         agent_mapping = orch.coordination_tracker.get_reverse_agent_mapping()
 
         all_snapshots = {}
+        # Logical (symlink) path per source, kept for stale-path rewriting even
+        # though the copy reads from the pinned concrete version.
+        logical_paths: dict[str, Path] = {}
+        # Pinned immutable versions to release after the copy completes.
+        version_store = None
+        pinned: list[Path] = []
         if orch._snapshot_storage:
             snapshot_base = Path(orch._snapshot_storage)
+            version_store = SnapshotVersionStore.for_base(snapshot_base)
             for source_agent_id in orch.agents.keys():
-                source_snapshot = snapshot_base / source_agent_id
-                if source_snapshot.exists() and source_snapshot.is_dir():
-                    all_snapshots[source_agent_id] = source_snapshot
+                # Pin the current immutable version so a concurrent peer
+                # save_snapshot (republish) cannot delete it mid-copy (B1 fix).
+                concrete = version_store.acquire(source_agent_id)
+                if concrete is not None and concrete.is_dir():
+                    all_snapshots[source_agent_id] = concrete
+                    logical_paths[source_agent_id] = snapshot_base / source_agent_id
+                    pinned.append(concrete)
 
         if orch._step_mode and orch._step_mode.enabled and orch._step_inputs:
             for va_id, va_state in orch._step_inputs.virtual_agents.items():
@@ -171,26 +183,36 @@ async def copy_all_snapshots_to_temp_workspace(
                     if va_ws.exists() and va_ws.is_dir():
                         all_snapshots[va_id] = va_ws
 
-        workspace_path = await agent.backend.filesystem_manager.copy_snapshots_to_temp_workspace(
-            all_snapshots,
-            agent_mapping,
-        )
+        try:
+            workspace_path = await agent.backend.filesystem_manager.copy_snapshots_to_temp_workspace(
+                all_snapshots,
+                agent_mapping,
+            )
 
-        if workspace_path:
-            from massgen.filesystem_manager import replace_stale_paths_in_workspace
-
-            for source_agent_id, snapshot_path in all_snapshots.items():
-                anon_id = agent_mapping.get(source_agent_id, source_agent_id)
-                dest_dir = workspace_path / anon_id
-                if not dest_dir.exists():
-                    continue
-                replacements: dict[str, str] = {str(snapshot_path): str(dest_dir)}
-                source_agent = orch.agents.get(source_agent_id)
-                if source_agent and source_agent.backend.filesystem_manager:
-                    fm = source_agent.backend.filesystem_manager
-                    if fm.cwd:
-                        replacements[str(fm.cwd)] = str(dest_dir)
-                replace_stale_paths_in_workspace(dest_dir, replacements)
+            if workspace_path:
+                from massgen.filesystem_manager import replace_stale_paths_in_workspace
+
+                for source_agent_id, snapshot_path in all_snapshots.items():
+                    anon_id = agent_mapping.get(source_agent_id, source_agent_id)
+                    dest_dir = workspace_path / anon_id
+                    if not dest_dir.exists():
+                        continue
+                    # Rewrite both the concrete version path AND the logical
+                    # symlink path, since either may be embedded in copied files.
+                    replacements: dict[str, str] = {str(snapshot_path): str(dest_dir)}
+                    logical = logical_paths.get(source_agent_id)
+                    if logical is not None and str(logical) != str(snapshot_path):
+                        replacements[str(logical)] = str(dest_dir)
+                    source_agent = orch.agents.get(source_agent_id)
+                    if source_agent and source_agent.backend.filesystem_manager:
+                        fm = source_agent.backend.filesystem_manager
+                        if fm.cwd:
+                            replacements[str(fm.cwd)] = str(dest_dir)
+                    replace_stale_paths_in_workspace(dest_dir, replacements)
+        finally:
+            if version_store is not None:
+                for concrete in pinned:
+                    version_store.release(concrete)
 
         return str(workspace_path) if workspace_path else None
 
@@ -529,13 +551,24 @@ def save_partial_workspace_snapshots_for_interrupted_turn(
                     f"[Orchestrator] Preserving existing snapshot for {agent_id} during interrupted turn: " f"{snapshot_storage}",
                 )
             elif workspace_has_content:
-                copied = orch._copy_workspace_contents(
-                    workspace_path,
-                    snapshot_storage,
-                    replace_destination=True,
-                )
+                # Publish a fresh IMMUTABLE version rather than rmtree+rebuild the
+                # public path: under the versioned-snapshot scheme that path is a
+                # symlink, and shutil.rmtree(symlink) raises -- which would have
+                # silently dropped the interrupted-turn snapshot. See
+                # SnapshotVersionStore (B1 race fix).
+                store = SnapshotVersionStore.for_base(Path(snapshot_storage).parent)
+                copied_count = {"n": 0}
+
+                def _populate_interrupted(version_dir: Path, _src=workspace_path) -> None:
+                    copied_count["n"] = orch._copy_workspace_contents(
+                        _src,
+                        version_dir,
+                        replace_destination=False,
+                    )
+
+                store.publish_version(Path(snapshot_storage).name, _populate_interrupted)
                 logger.info(
-                    f"[Orchestrator] Saved interrupted-turn workspace snapshot for {agent_id} to " f"{snapshot_storage} ({copied} items)",
+                    f"[Orchestrator] Saved interrupted-turn workspace snapshot for {agent_id} to " f"{snapshot_storage} ({copied_count['n']} items)",
                 )
 
         if log_session_dir:
diff --git a/massgen/skills/massgen-release-documenter/SKILL.md b/massgen/skills/massgen-release-documenter/SKILL.md
index 07eb0c1d1..6a74a7764 100644
--- a/massgen/skills/massgen-release-documenter/SKILL.md
+++ b/massgen/skills/massgen-release-documenter/SKILL.md
@@ -44,12 +44,14 @@ This file contains:
 **Always follow this order:**
 
 1. **CHANGELOG.md** ⭐ START HERE
-2. **Sphinx Documentation** (docs/source/)
-3. **Config Documentation** (massgen/configs/README.md)
-4. **Case Studies** (docs/source/examples/case_studies/)
-5. **README.md**
-6. **README_PYPI.md** (auto-synced via pre-commit)
-7. **Roadmap** (ROADMAP.md)
+2. **Version bump** (`massgen/__init__.py` `__version__`)
+3. **Sphinx Documentation** (docs/source/)
+4. **Config Documentation** (massgen/configs/README.md)
+5. **Case Studies** (docs/source/examples/case_studies/)
+6. **README.md**
+7. **README_PYPI.md** (auto-synced via pre-commit)
+8. **Roadmap** (ROADMAP.md)
+9. **Announcements** (docs/announcements/) — current-release.md, github-release-vX.md, archive
 
 This order is critical - never skip ahead!
 
@@ -136,26 +138,49 @@ uv run python scripts/sync_readme_pypi.py
 - Update `ROADMAP_v0.1.X+1.md` for next release
 - Do NOT edit `docs/source/development/roadmap.rst` (auto-generated)
 
+### Phase 8: Announcements (`docs/announcements/`)
+
+**⚠️ Easy to miss — not auto-generated.** Each release rotates three things in `docs/announcements/`:
+
+1. **Archive the outgoing announcement**: copy the current `current-release.md` to `archive/v0.1.X-1.md` (the version it currently describes).
+   ```bash
+   cp docs/announcements/current-release.md docs/announcements/archive/v0.1.X-1.md
+   ```
+2. **Rewrite `current-release.md`** for the new version: update the title, Release Summary, Install version, release-notes link, "Suggested image" version, and the full LinkedIn announcement body (Key Improvements bullets). This is the long-form social/LinkedIn copy.
+3. **Replace the GitHub-release highlights file**: delete `github-release-v0.1.X-1.md` and create `github-release-v0.1.X.md` (the short, emoji-sectioned GitHub Releases body dated `(YYYY-MM-DD)`).
+   ```bash
+   git rm docs/announcements/github-release-v0.1.X-1.md
+   # then write docs/announcements/github-release-v0.1.X.md
+   ```
+
+`feature-highlights.md` and `README.md` in that directory are general (not per-version) — leave them unless the highlights changed.
+
+Use the just-archived previous version's files as templates so the structure/sections stay consistent. Keep `[TO BE ADDED AFTER POSTING]` placeholders for the X/LinkedIn links.
+
+> **Don't forget the version bump** (`massgen/__init__.py` `__version__ = "0.1.X"`) — `pyproject.toml` reads the version dynamically from there.
+
 ## Quick Validation Checklist
 
 **Must Update (every release):**
 1. ✅ CHANGELOG.md
-2. ✅ docs/source/index.rst (Recent Releases)
-3. ✅ docs/source/user_guide/ (if user-facing feature)
-4. ✅ README.md (Recent Achievements)
-5. ✅ massgen/configs/ (example configs)
-6. ✅ Case study
+2. ✅ `massgen/__init__.py` (`__version__` bump)
+3. ✅ docs/source/index.rst (Recent Releases)
+4. ✅ README.md (Recent Achievements + Latest Features + TOC anchors)
+5. ✅ ROADMAP.md (Current Version, completed section, table)
+6. ✅ docs/announcements/ (archive old, rewrite current-release.md, swap github-release-vX.md)
+7. ⚠️ docs/source/user_guide/ (if user-facing feature)
+8. ⚠️ massgen/configs/ (example configs, if any)
+9. ⚠️ Case study (skip for internal-quality/no-user-facing-feature releases)
 
 **Should Update (if applicable):**
-7. ⚠️ massgen/config_builder.py (if config params added)
-8. ⚠️ massgen/backend/capabilities.py (if backend changes)
-9. ✅ README_PYPI.md (auto-synced)
-10. ⚠️ ROADMAP.md
+10. ⚠️ massgen/config_builder.py (if config params added)
+11. ⚠️ massgen/backend/capabilities.py (if backend changes)
+12. ✅ README_PYPI.md (auto-synced from README.md via pre-commit)
 
 **Build & Verify:**
-11. 🔨 `cd docs && make html && make linkcheck`
-12. 🔨 Test new config files
-13. 🔨 Verify all links work
+13. 🔨 `cd docs && make html && make linkcheck`
+14. 🔨 Test new config files
+15. 🔨 Verify all links work
 
 See `docs/dev_notes/release_checklist.md` section "Quick Reference Checklist" for complete list.
 
diff --git a/massgen/tests/test_concurrency_race_fixes.py b/massgen/tests/test_concurrency_race_fixes.py
index 33a149f81..65214550a 100644
--- a/massgen/tests/test_concurrency_race_fixes.py
+++ b/massgen/tests/test_concurrency_race_fixes.py
@@ -358,6 +358,34 @@ def test_d2_record_is_safe_for_unknown_agent(mock_orchestrator) -> None:
     orch._record_round_isolation_degraded("nonexistent", ValueError("boom"))
 
 
+def test_d2_emits_status_with_valid_signature(mock_orchestrator, monkeypatch) -> None:
+    """The degradation signal must reach the emitter via emit_status's real
+    contract (message=..., agent_id=...). A previous bug passed status=..., which
+    is not a parameter of EventEmitter.emit_status, so the call raised TypeError
+    that was silently swallowed -- the visible signal never fired.
+    """
+    import massgen.orchestrator as orch_mod
+
+    captured: dict[str, object] = {}
+
+    class _Emitter:
+        def emit_status(self, message, level="info", agent_id=None):
+            captured["message"] = message
+            captured["level"] = level
+            captured["agent_id"] = agent_id
+
+    orch = mock_orchestrator(num_agents=1)
+    monkeypatch.setattr(orch_mod, "get_event_emitter", lambda: _Emitter())
+
+    orch._record_round_isolation_degraded("agent_a", RuntimeError("git worktree add failed"))
+
+    # The emitter must have been called with a real message + agent_id and no
+    # bogus status kwarg (the _Emitter signature would TypeError on status=).
+    assert captured.get("agent_id") == "agent_a"
+    assert "round isolation degraded" in str(captured.get("message", ""))
+    assert "git worktree add failed" in str(captured.get("message", ""))
+
+
 # --------------------------------------------------------------------------- #
 # D3 — post-record changedoc enrichment must not kill a valid-answer agent
 # --------------------------------------------------------------------------- #
diff --git a/massgen/tests/test_snapshot_version_store.py b/massgen/tests/test_snapshot_version_store.py
new file mode 100644
index 000000000..1c0a241b7
--- /dev/null
+++ b/massgen/tests/test_snapshot_version_store.py
@@ -0,0 +1,213 @@
+"""Unit tests for SnapshotVersionStore (B1 immutable-versioned snapshots).
+
+The store guarantees that a reader copying agent B's snapshot reads from an
+immutable version directory that B's concurrent ``save_snapshot`` (republish)
+will never delete or mutate underneath it.
+"""
+
+from __future__ import annotations
+
+import os
+import threading
+from pathlib import Path
+
+import pytest
+
+from massgen.filesystem_manager._snapshot_version_store import SnapshotVersionStore
+
+
+@pytest.fixture(autouse=True)
+def _reset_store():
+    SnapshotVersionStore.reset_all_for_tests()
+    yield
+    SnapshotVersionStore.reset_all_for_tests()
+
+
+def _populate(content: dict[str, str]):
+    def _fn(dest: Path) -> None:
+        for name, text in content.items():
+            (dest / name).write_text(text)
+
+    return _fn
+
+
+def test_publish_creates_version_and_symlink(tmp_path):
+    store = SnapshotVersionStore.for_base(tmp_path)
+    v1 = store.publish_version("agentA", _populate({"answer.md": "v1"}))
+
+    link = tmp_path / "agentA"
+    assert link.is_symlink(), "public path must be a symlink to the version"
+    assert (link / "answer.md").read_text() == "v1"
+    assert v1.name == "v1"
+    assert v1.parent == tmp_path / ".versions" / "agentA"
+
+
+def test_republish_advances_version_and_repoints(tmp_path):
+    store = SnapshotVersionStore.for_base(tmp_path)
+    store.publish_version("agentA", _populate({"answer.md": "v1"}))
+    v2 = store.publish_version("agentA", _populate({"answer.md": "v2"}))
+
+    link = tmp_path / "agentA"
+    assert (link / "answer.md").read_text() == "v2"
+    assert Path(os.path.realpath(link)) == v2
+    assert v2.name == "v2"
+
+
+def test_for_base_returns_shared_instance(tmp_path):
+    a = SnapshotVersionStore.for_base(tmp_path)
+    b = SnapshotVersionStore.for_base(str(tmp_path))
+    assert a is b, "all participants on the same base must share one store"
+
+
+def test_acquire_returns_concrete_version(tmp_path):
+    store = SnapshotVersionStore.for_base(tmp_path)
+    v1 = store.publish_version("agentA", _populate({"answer.md": "v1"}))
+
+    acquired = store.acquire("agentA")
+    assert acquired == v1
+    assert not acquired.is_symlink(), "acquire must return the concrete version dir"
+    store.release(acquired)
+
+
+def test_acquire_missing_agent_returns_none(tmp_path):
+    store = SnapshotVersionStore.for_base(tmp_path)
+    assert store.acquire("nope") is None
+
+
+def test_gc_removes_unreferenced_old_versions(tmp_path):
+    store = SnapshotVersionStore.for_base(tmp_path)
+    v1 = store.publish_version("agentA", _populate({"answer.md": "v1"}))
+    store.publish_version("agentA", _populate({"answer.md": "v2"}))
+    # v1 had no readers -> GC'd when v2 published.
+    assert not v1.exists(), "unreferenced previous version should be garbage-collected"
+
+
+def test_acquired_version_survives_republish_and_gc(tmp_path):
+    """The core race guarantee: a pinned version is never deleted by GC.
+
+    Reader acquires v1, then TWO republishes happen. Without refcounting GC would
+    delete v1 (it's neither current nor previous), corrupting the in-flight read.
+    """
+    store = SnapshotVersionStore.for_base(tmp_path)
+    v1 = store.publish_version("agentA", _populate({"answer.md": "v1"}))
+
+    pinned = store.acquire("agentA")
+    assert pinned == v1
+
+    store.publish_version("agentA", _populate({"answer.md": "v2"}))
+    store.publish_version("agentA", _populate({"answer.md": "v3"}))
+
+    # The pinned version is still fully intact for the reader.
+    assert pinned.exists()
+    assert (pinned / "answer.md").read_text() == "v1"
+
+    # After release + another publish, it is finally collectable.
+    store.release(pinned)
+    store.publish_version("agentA", _populate({"answer.md": "v4"}))
+    assert not pinned.exists()
+
+
+def test_concurrent_publish_during_read_no_corruption(tmp_path):
+    """Reproduce the B1 interleave: republish (writer) while a copy (reader) runs.
+
+    A reader pins the current version and copies from it on a worker thread while
+    the main thread republishes repeatedly. The reader's source must stay intact
+    for the entire copy -- no FileNotFoundError, complete content.
+    """
+    import shutil
+
+    store = SnapshotVersionStore.for_base(tmp_path)
+    # Seed a version with enough files that a copy takes a measurable walk.
+    store.publish_version("agentA", _populate({f"f{i}.txt": f"content-{i}" for i in range(50)}))
+
+    errors: list[Exception] = []
+    copied_ok: list[bool] = []
+    start_copy = threading.Event()
+
+    def _reader():
+        pinned = store.acquire("agentA")
+        start_copy.set()
+        try:
+            for _ in range(20):
+                dest = tmp_path / f"dest_{threading.get_ident()}_{_}"
+                shutil.copytree(pinned, dest)
+                # Verify completeness of every copy.
+                assert len(list(dest.iterdir())) == 50
+                shutil.rmtree(dest)
+            copied_ok.append(True)
+        except Exception as e:  # noqa: BLE001
+            errors.append(e)
+        finally:
+            store.release(pinned)
+
+    t = threading.Thread(target=_reader)
+    t.start()
+    start_copy.wait(timeout=5)
+    # Hammer republishes (writer) concurrently with the reader's copies.
+    for i in range(2, 40):
+        store.publish_version("agentA", _populate({f"f{j}.txt": f"content-{j}-r{i}" for j in range(50)}))
+    t.join(timeout=10)
+
+    assert not errors, f"reader saw corruption/errors during concurrent republish: {errors}"
+    assert copied_ok == [True]
+
+
+def test_concurrent_publishers_same_name_no_inflight_gc(tmp_path):
+    """A concurrent publish for the SAME name must not GC the other's in-flight version.
+
+    Without the in-flight guard, publisher B's GC deletes publisher A's freshly
+    reserved-but-unpopulated version (refcount 0, not the keep target), so A's
+    populate writes into a deleted directory and crashes / produces a dangling
+    symlink.
+    """
+    store = SnapshotVersionStore.for_base(tmp_path)
+    # Seed v1 so subsequent publishes have an "old" version to tempt GC.
+    store.publish_version("agentA", _populate({"answer.md": "v1"}))
+
+    a_populating = threading.Event()
+    b_done = threading.Event()
+    errors: list[Exception] = []
+
+    def _slow_populate_A(dest: Path) -> None:
+        # Signal we are mid-populate, then wait until B has fully published+GC'd.
+        a_populating.set()
+        assert b_done.wait(timeout=5)
+        (dest / "answer.md").write_text("from-A")
+
+    def _publish_A():
+        try:
+            store.publish_version("agentA", _slow_populate_A)
+        except Exception as e:  # noqa: BLE001
+            errors.append(e)
+
+    t = threading.Thread(target=_publish_A)
+    t.start()
+    assert a_populating.wait(timeout=5)
+
+    # While A is mid-populate, B publishes (repoint + GC) for the same name.
+    store.publish_version("agentA", _populate({"answer.md": "from-B"}))
+    b_done.set()
+    t.join(timeout=10)
+
+    assert not errors, f"concurrent publisher's in-flight version was GC'd: {errors}"
+    # A's directory survived its populate and is intact.
+    link = tmp_path / "agentA"
+    assert link.is_symlink()
+    assert (link / "answer.md").read_text() in {"from-A", "from-B"}
+
+
+def test_symlink_fallback_when_unsupported(tmp_path, monkeypatch):
+    """Without symlink support, publishing falls back to a direct copy that still works."""
+    import massgen.filesystem_manager._snapshot_version_store as mod
+
+    def _no_symlink(*args, **kwargs):
+        raise OSError("symlinks not supported")
+
+    monkeypatch.setattr(mod.os, "symlink", _no_symlink)
+
+    store = SnapshotVersionStore.for_base(tmp_path)
+    store.publish_version("agentA", _populate({"answer.md": "v1"}))
+
+    link = tmp_path / "agentA"
+    assert not link.is_symlink()
+    assert (link / "answer.md").read_text() == "v1"
diff --git a/massgen/tests/test_snapshot_versioned_save.py b/massgen/tests/test_snapshot_versioned_save.py
new file mode 100644
index 000000000..a9189ab36
--- /dev/null
+++ b/massgen/tests/test_snapshot_versioned_save.py
@@ -0,0 +1,252 @@
+"""Integration tests: save_snapshot publishes immutable versioned snapshots (B1).
+
+These drive the real ``FilesystemManager.save_snapshot`` writer path (not just
+the SnapshotVersionStore unit) and prove:
+
+  1. The public ``<base>/<agent_id>`` path becomes a symlink to an immutable
+     version directory, with content intact.
+  2. Re-saving advances the version and GCs the unreferenced old one.
+  3. The original B1 race is gone: while a reader copies the current version,
+     a concurrent re-save (republish) does NOT delete/mutate the source the
+     reader holds -- no FileNotFoundError, complete copy.
+"""
+
+from __future__ import annotations
+
+import shutil
+import threading
+import types
+from pathlib import Path
+
+import pytest
+
+from massgen.filesystem_manager._filesystem_manager import FilesystemManager
+from massgen.filesystem_manager._snapshot_version_store import SnapshotVersionStore
+from massgen.orchestrator_collaborators.snapshot_manager import SnapshotManager
+from massgen.orchestrator_collaborators.workspace_lifecycle_manager import (
+    WorkspaceLifecycleManager,
+)
+
+
+@pytest.fixture(autouse=True)
+def _reset_store():
+    SnapshotVersionStore.reset_all_for_tests()
+    yield
+    SnapshotVersionStore.reset_all_for_tests()
+
+
+def _make_writer_fm(base: Path, agent_id: str, workspace: Path) -> FilesystemManager:
+    fm = FilesystemManager.__new__(FilesystemManager)
+    fm.agent_id = agent_id
+    fm.use_two_tier_workspace = False
+    fm.cwd = str(workspace)
+    fm.snapshot_storage = base / agent_id
+    fm.snapshot_storage.mkdir(parents=True, exist_ok=True)
+    return fm
+
+
+def _seed_workspace(ws: Path, marker: str, n_files: int = 1) -> None:
+    ws.mkdir(parents=True, exist_ok=True)
+    for i in range(n_files):
+        (ws / f"file{i}.txt").write_text(f"{marker}-{i}")
+    sub = ws / "src"
+    sub.mkdir(exist_ok=True)
+    (sub / "main.py").write_text(f"# {marker}\n")
+
+
+@pytest.mark.asyncio
+async def test_save_snapshot_publishes_symlinked_version(tmp_path) -> None:
+    base = tmp_path / "snapshot_storage"
+    ws = tmp_path / "ws_b"
+    _seed_workspace(ws, "round1")
+    fm = _make_writer_fm(base, "agentB", ws)
+
+    await fm.save_snapshot()
+
+    link = base / "agentB"
+    assert link.is_symlink(), "snapshot_storage/<agent_id> must be a symlink to an immutable version"
+    assert (link / "file0.txt").read_text() == "round1-0"
+    assert (link / "src" / "main.py").exists()
+    # The concrete version lives under .versions
+    target = link.resolve()
+    assert target.parent == base / ".versions" / "agentB"
+    assert target.name == "v1"
+
+
+@pytest.mark.asyncio
+async def test_resave_advances_version_and_gcs_old(tmp_path) -> None:
+    base = tmp_path / "snapshot_storage"
+    ws = tmp_path / "ws_b"
+    _seed_workspace(ws, "round1")
+    fm = _make_writer_fm(base, "agentB", ws)
+
+    await fm.save_snapshot()
+    v1 = (base / "agentB").resolve()
+
+    # New workspace content, save again.
+    shutil.rmtree(ws)
+    _seed_workspace(ws, "round2")
+    await fm.save_snapshot()
+
+    link = base / "agentB"
+    v2 = link.resolve()
+    assert (link / "file0.txt").read_text() == "round2-0"
+    assert v2.name == "v2"
+    assert not v1.exists(), "unreferenced previous version should be GC'd"
+
+
+@pytest.mark.asyncio
+async def test_resave_during_read_does_not_corrupt(tmp_path) -> None:
+    """The original B1 race: re-save while a peer copies the current version."""
+    base = tmp_path / "snapshot_storage"
+    ws = tmp_path / "ws_b"
+    _seed_workspace(ws, "round1", n_files=40)
+    fm = _make_writer_fm(base, "agentB", ws)
+    await fm.save_snapshot()
+
+    store = SnapshotVersionStore.for_base(base)
+
+    errors: list[Exception] = []
+    copied_ok: list[bool] = []
+    reader_started = threading.Event()
+
+    def _reader():
+        pinned = store.acquire("agentB")
+        reader_started.set()
+        try:
+            for i in range(15):
+                dest = tmp_path / f"copy_{i}"
+                shutil.copytree(pinned, dest)
+                # src/main.py + 40 files == 41 entries
+                assert (dest / "file0.txt").read_text() == "round1-0"
+                assert len(list(dest.iterdir())) == 41
+                shutil.rmtree(dest)
+            copied_ok.append(True)
+        except Exception as e:  # noqa: BLE001
+            errors.append(e)
+        finally:
+            store.release(pinned)
+
+    t = threading.Thread(target=_reader)
+    t.start()
+    reader_started.wait(timeout=5)
+
+    # Concurrently republish (writer) many times while the reader copies.
+    for r in range(2, 25):
+        shutil.rmtree(ws)
+        _seed_workspace(ws, f"round{r}", n_files=40)
+        await fm.save_snapshot()
+
+    t.join(timeout=15)
+
+    assert not errors, f"reader saw corruption during concurrent re-save (B1 regression): {errors}"
+    assert copied_ok == [True]
+
+
+def _make_orch_stub():
+    """Minimal orchestrator stub exposing the helpers the interrupted/copy paths use."""
+    orch = types.SimpleNamespace()
+    orch._has_meaningful_workspace_content = lambda p: bool(p) and Path(p).exists() and any(Path(p).iterdir())
+    orch._copy_workspace_contents = lambda src, dest, replace_destination=False: WorkspaceLifecycleManager.copy_workspace_contents(
+        Path(src),
+        Path(dest),
+        replace_destination=replace_destination,
+    )
+    return orch
+
+
+@pytest.mark.asyncio
+async def test_interrupted_save_over_published_symlink_does_not_crash(tmp_path) -> None:
+    """Regression: interrupted-turn save must not rmtree the published symlink.
+
+    When an empty/metadata-only version was already published (public path is a
+    symlink) and the live workspace has content, the interrupted-turn save used
+    to do shutil.rmtree(<symlink>) -> OSError -> the snapshot was silently lost.
+    It must instead publish a new version.
+    """
+    base = tmp_path / "snapshot_storage"
+    base.mkdir()
+    ws = tmp_path / "ws_b"
+
+    # Publish an EMPTY version first so the public path becomes a symlink with no content.
+    store = SnapshotVersionStore.for_base(base)
+    store.publish_version("agentB", lambda d: None)
+    link = base / "agentB"
+    assert link.is_symlink()
+    assert not any(link.iterdir())
+
+    # Now the live workspace has real content.
+    _seed_workspace(ws, "interrupted")
+
+    fm = FilesystemManager.__new__(FilesystemManager)
+    fm.agent_id = "agentB"
+    fm.snapshot_storage = link
+    fm.get_current_workspace = lambda: str(ws)
+    backend = types.SimpleNamespace(filesystem_manager=fm)
+
+    sm = SnapshotManager(_make_orch_stub())
+
+    # Must not raise (old code raised OSError on rmtree of the symlink).
+    sm.save_partial_workspace_snapshots_for_interrupted_turn(
+        agent_id="agentB",
+        backend=backend,
+        timestamp="t0",
+        log_session_dir=None,
+    )
+
+    # A new version was published with the workspace content; symlink still valid.
+    assert link.is_symlink()
+    assert (link / "file0.txt").read_text() == "interrupted-0"
+
+
+@pytest.mark.asyncio
+async def test_orchestrator_reader_pins_version(tmp_path, monkeypatch) -> None:
+    """End-to-end wiring: copy_all_snapshots_to_temp_workspace must acquire() a
+    pinned concrete version (not read the raw symlink). If the acquire() call were
+    removed from snapshot_manager, this test fails.
+    """
+    base = tmp_path / "snapshot_storage"
+    base.mkdir()
+
+    # Publish a snapshot for the source agent B.
+    ws_b = tmp_path / "ws_b"
+    _seed_workspace(ws_b, "peerwork")
+    writer = _make_writer_fm(base, "agentB", ws_b)
+    await writer.save_snapshot()
+
+    store = SnapshotVersionStore.for_base(base)
+
+    # Spy on acquire/release through the real store.
+    acquired_names: list[str] = []
+    real_acquire = store.acquire
+    real_release = store.release
+    released: list = []
+    monkeypatch.setattr(store, "acquire", lambda name: (acquired_names.append(name), real_acquire(name))[1])
+    monkeypatch.setattr(store, "release", lambda v: (released.append(v), real_release(v))[1])
+
+    # Viewing agent A with a real temp workspace + the real copy method.
+    viewer_fm = FilesystemManager.__new__(FilesystemManager)
+    viewer_fm.agent_temporary_workspace = tmp_path / "temp_ws_a"
+    monkeypatch.setattr(viewer_fm, "_normalize_media_call_ledger_paths", lambda **kw: None)
+    viewer_fm.cwd = str(tmp_path / "ws_a")
+
+    viewer_agent = types.SimpleNamespace(backend=types.SimpleNamespace(filesystem_manager=viewer_fm))
+    source_agent = types.SimpleNamespace(backend=types.SimpleNamespace(filesystem_manager=writer))
+
+    orch = types.SimpleNamespace()
+    orch._snapshot_storage = str(base)
+    orch.agents = {"agentA": viewer_agent, "agentB": source_agent}
+    orch.coordination_tracker = types.SimpleNamespace(get_reverse_agent_mapping=lambda: {})
+    orch._step_mode = None
+    orch._step_inputs = None
+
+    sm = SnapshotManager(orch)
+    result = await sm.copy_all_snapshots_to_temp_workspace("agentA")
+
+    # acquire() was actually used for the source agent (the load-bearing wiring).
+    assert "agentB" in acquired_names, "reader did not acquire() a pinned version"
+    # The copy produced agentB's peer content.
+    assert result is not None
+    assert (Path(result) / "agentB" / "file0.txt").read_text() == "peerwork-0"
+    # Pins were released (no refcount leak that would disable GC).
+    assert store._refcounts == {}, f"refcount leak after copy: {store._refcounts}"
diff --git a/massgen/tests/test_wait_interrupt_provider.py b/massgen/tests/test_wait_interrupt_provider.py
new file mode 100644
index 000000000..5d68d8e85
--- /dev/null
+++ b/massgen/tests/test_wait_interrupt_provider.py
@@ -0,0 +1,103 @@
+"""Tests for the deduplicated background-wait interrupt provider.
+
+The three hook-setup paths (GeneralHookManager, Codex MCP, native) used to each
+inline an identical ``_wait_interrupt_provider`` closure. They now share
+``MidStreamInjectionHookInstaller._install_wait_interrupt_provider``. These tests
+pin the consolidated contract so the single implementation can't silently drift:
+
+  * it registers a provider on backends that support it (and no-ops otherwise),
+  * cancelled turn  -> ``turn_cancelled`` with no content,
+  * runtime sections present -> ``runtime_injection_available`` with joined content,
+  * no sections -> ``None``.
+"""
+
+from __future__ import annotations
+
+import types
+from unittest.mock import AsyncMock
+
+import pytest
+
+from massgen.orchestrator_collaborators.midstream_injection_hook_installer import (
+    MidStreamInjectionHookInstaller,
+)
+
+
+def _installer(orch) -> MidStreamInjectionHookInstaller:
+    inst = MidStreamInjectionHookInstaller.__new__(MidStreamInjectionHookInstaller)
+    inst._orchestrator = orch
+    return inst
+
+
+def _agent_with_capture():
+    captured: dict = {}
+
+    def _set(provider):
+        captured["provider"] = provider
+
+    backend = types.SimpleNamespace(set_background_wait_interrupt_provider=_set)
+    return types.SimpleNamespace(backend=backend), captured
+
+
+def test_provider_registered_when_backend_supports_it():
+    orch = types.SimpleNamespace()
+    agent, captured = _agent_with_capture()
+    _installer(orch)._install_wait_interrupt_provider(agent, "agentA")
+    assert callable(captured.get("provider"))
+
+
+def test_no_op_when_backend_unsupported():
+    orch = types.SimpleNamespace()
+    # backend without set_background_wait_interrupt_provider must not raise.
+    agent = types.SimpleNamespace(backend=types.SimpleNamespace())
+    _installer(orch)._install_wait_interrupt_provider(agent, "agentA")  # no exception
+
+
+@pytest.mark.asyncio
+async def test_provider_returns_turn_cancelled_when_cancelled():
+    orch = types.SimpleNamespace(
+        cancellation_manager=types.SimpleNamespace(is_cancelled=True),
+        _collect_no_hook_runtime_fallback_sections=AsyncMock(return_value=["unused"]),
+    )
+    agent, captured = _agent_with_capture()
+    _installer(orch)._install_wait_interrupt_provider(agent, "agentA")
+    result = await captured["provider"]("agentA")
+    assert result == {"interrupt_reason": "turn_cancelled", "injected_content": None}
+
+
+@pytest.mark.asyncio
+async def test_provider_returns_runtime_injection_when_sections_present():
+    orch = types.SimpleNamespace(
+        cancellation_manager=types.SimpleNamespace(is_cancelled=False),
+        _collect_no_hook_runtime_fallback_sections=AsyncMock(return_value=["sec1", "sec2"]),
+    )
+    agent, captured = _agent_with_capture()
+    _installer(orch)._install_wait_interrupt_provider(agent, "agentA")
+    result = await captured["provider"]("agentA")
+    assert result["interrupt_reason"] == "runtime_injection_available"
+    assert result["injected_content"] == "sec1\nsec2"
+
+
+@pytest.mark.asyncio
+async def test_provider_returns_none_when_no_sections():
+    orch = types.SimpleNamespace(
+        cancellation_manager=types.SimpleNamespace(is_cancelled=False),
+        _collect_no_hook_runtime_fallback_sections=AsyncMock(return_value=[]),
+    )
+    agent, captured = _agent_with_capture()
+    _installer(orch)._install_wait_interrupt_provider(agent, "agentA")
+    assert await captured["provider"]("agentA") is None
+
+
+@pytest.mark.asyncio
+async def test_provider_falls_back_to_installed_agent_id():
+    """When the runtime asks with an empty requested id, the install-time id is used."""
+    collect = AsyncMock(return_value=[])
+    orch = types.SimpleNamespace(
+        cancellation_manager=types.SimpleNamespace(is_cancelled=False),
+        _collect_no_hook_runtime_fallback_sections=collect,
+    )
+    agent, captured = _agent_with_capture()
+    _installer(orch)._install_wait_interrupt_provider(agent, "agentA")
+    await captured["provider"]("")  # empty requested id
+    collect.assert_awaited_once_with("agentA")

From 25a21461319861af861c9fed872395a701d3d870 Mon Sep 17 00:00:00 2001
From: ncrispino <ncrispino@wustl.edu>
Date: Fri, 5 Jun 2026 09:30:15 -0700
Subject: [PATCH 4/4] chore+docs: apply missed v0.1.94 branch bootstrap +
 document it in release-documenter skill

The fresh-release-branch bootstrap (the `feat: v0.1.X` commit that starts each
dev branch) was partially skipped for dev/v0.1.94: the forward-looking roadmap
file was never rolled forward. Apply it now and document the step so it isn't
missed again.

- Rename ROADMAP_v0.1.94.md -> ROADMAP_v0.1.95.md and repoint its content at the
  next release (image/video edit #959 now planned for v0.1.95, deferred range
  bumped to v0.1.86-v0.1.94, v0.1.93/v0.1.94 added to Related Tracks).
  (The __version__ bump, the other half of the bootstrap, landed in the prior
  release-docs commit.)
- Document the previously-undocumented "Phase 0: Fresh Release Branch Bootstrap"
  (version bump + ROADMAP_v0.1.X.md -> ROADMAP_v0.1.X+1.md rename) in the
  release-documenter SKILL.md, including order list and validation checklist
  entries, derived from how dev/v0.1.93 started (commit 2deb48bf).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---
 ROADMAP_v0.1.94.md => ROADMAP_v0.1.95.md         |  8 +++++---
 .../skills/massgen-release-documenter/SKILL.md   | 16 ++++++++++++++++
 2 files changed, 21 insertions(+), 3 deletions(-)
 rename ROADMAP_v0.1.94.md => ROADMAP_v0.1.95.md (77%)

diff --git a/ROADMAP_v0.1.94.md b/ROADMAP_v0.1.95.md
similarity index 77%
rename from ROADMAP_v0.1.94.md
rename to ROADMAP_v0.1.95.md
index f2e7ba55b..c94319151 100644
--- a/ROADMAP_v0.1.94.md
+++ b/ROADMAP_v0.1.95.md
@@ -1,14 +1,14 @@
-# MassGen v0.1.94 Roadmap
+# MassGen v0.1.95 Roadmap
 
 **Target Release:** TBD
 
 ## Overview
 
-Version 0.1.94 picks up the image/video edit work deferred from v0.1.86-v0.1.93 and continues multimodal provider-parity work.
+Version 0.1.95 picks up the image/video edit work deferred from v0.1.86-v0.1.94 and continues multimodal provider-parity work.
 
 ---
 
-## Feature: Image/Video Edit Capabilities (Deferred from v0.1.86-v0.1.93)
+## Feature: Image/Video Edit Capabilities (Deferred from v0.1.86-v0.1.94)
 
 **Issue:** [#959](https://github.com/massgen/MassGen/issues/959)
 **Owner:** @ncrispino
@@ -30,6 +30,8 @@ Version 0.1.94 picks up the image/video edit work deferred from v0.1.86-v0.1.93
 
 ## Related Tracks
 
+- **v0.1.94**: Parallelism Hardening (engineering health) — snapshot copy moved off the event loop with immutable versioned snapshots, lock-free concurrency-race fixes, unified mid-stream injection, and worktree-isolation degradation surfaced
+- **v0.1.93**: CLI package decomposition and pydantic config migration — focused `massgen/cli/` package, construction-time config validation with `Literal`-typed modes, single-source exclusion lists, dead-code removal, and test-signal/type-checking hardening
 - **v0.1.92**: Orchestrator collaborator refactor and Parallel Search MCP — 49 collaborator extractions, Textual display helper split, characterization coverage, and a Parallel hosted search example
 - **v0.1.91**: Config reliability and hook safety — centralized config parsing, strict unknown-key validation, checklist runtime control wiring, and nested native-hook permission precedence
 - **v0.1.90**: Discriminative criteria refinements and checklist calibration — score-spread pruning, per-criterion feedback, position-bias counterbalancing, unified checklist gate, and shared score utilities
diff --git a/massgen/skills/massgen-release-documenter/SKILL.md b/massgen/skills/massgen-release-documenter/SKILL.md
index 6a74a7764..58a9e012c 100644
--- a/massgen/skills/massgen-release-documenter/SKILL.md
+++ b/massgen/skills/massgen-release-documenter/SKILL.md
@@ -43,6 +43,7 @@ This file contains:
 
 **Always follow this order:**
 
+0. **Fresh-branch bootstrap** (once, at branch creation) — version bump + rename `ROADMAP_v0.1.X.md` → `ROADMAP_v0.1.X+1.md` (see Phase 0)
 1. **CHANGELOG.md** ⭐ START HERE
 2. **Version bump** (`massgen/__init__.py` `__version__`)
 3. **Sphinx Documentation** (docs/source/)
@@ -57,6 +58,20 @@ This order is critical - never skip ahead!
 
 ## Quick Reference Workflow
 
+### Phase 0: Fresh Release Branch Bootstrap (do this when the branch is created)
+
+**⚠️ Easy to miss — this happens once, at the *start* of a new `dev/v0.1.X` branch, not at doc-writing time.** When `dev/v0.1.X` is branched (right after the previous release merges in), a small bootstrap commit (`feat: v0.1.X`) sets the branch up:
+
+1. **Bump the version**: `massgen/__init__.py` `__version__ = "0.1.X"` (`pyproject.toml` reads it dynamically).
+2. **Roll the forward-looking roadmap file**: rename `ROADMAP_v0.1.X.md` → `ROADMAP_v0.1.X+1.md` and rewrite its content to plan the *next* release. This file always names the version *after* the one currently in development (the in-development version is tracked in the main `ROADMAP.md` sections). Update its title, "Overview", the deferred-feature "Deferred from …" range, and add the just-shipped version(s) to its "Related Tracks" list.
+
+```bash
+git mv ROADMAP_v0.1.X.md ROADMAP_v0.1.X+1.md
+# then edit __version__ and the renamed roadmap file
+```
+
+> If you arrive mid-branch and find `ROADMAP_v0.1.X.md` (matching the in-dev version) still present, or `__version__` still on the previous release, the bootstrap was skipped — do it now before the release docs.
+
 ### Phase 1: CHANGELOG.md (Required First Step)
 
 Document all changes under these categories:
@@ -162,6 +177,7 @@ Use the just-archived previous version's files as templates so the structure/sec
 ## Quick Validation Checklist
 
 **Must Update (every release):**
+0. ✅ Fresh-branch bootstrap done? (`__version__` bumped + `ROADMAP_v0.1.X.md` → `ROADMAP_v0.1.X+1.md` renamed — see Phase 0)
 1. ✅ CHANGELOG.md
 2. ✅ `massgen/__init__.py` (`__version__` bump)
 3. ✅ docs/source/index.rst (Recent Releases)