massgen · ncrispino · Jun 12, 2026 · Jun 12, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -11,11 +11,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### Theme: Application-Layer Permission Engine
 
-A layered, **fully opt-in** permission system for agent tool calls — the application-layer companion to v0.1.96's OS sandbox. When a `permissions:` block is present, every tool call flows through a hardline catastrophic-command floor, a declarative `allow/ask/deny` rule layer, and a blast-radius risk classifier, resolving to allow / **ask** / deny. An `ask` routes through a pluggable approval provider: an interactive TUI modal (allow once / session / always · reject), an automation policy (`risk-based` / `deny-all` / `allow-all`), or a file request/response handshake for headless/remote approval. Approvals are recorded in an append-only audit ledger; per-agent **role presets** (e.g. `read-only`) scope each agent and also empty the SRT writable set as an OS backstop. A channel-based **guardrail system prompt** tells the model to follow blocks and surface-and-ask rather than circumvent — while keeping `ask` a sanctioned path. **Presence-gated**: a config with no `permissions:` block is 100% unchanged. All items landed under TDD (tests first, confirmed red, then green), with live verification across automation runs. **Honest scope**: the prompt + regex classifier are best-effort *alignment*, not enforcement — the OS sandbox (v0.1.96) remains the load-bearing control (see `docs/dev_notes/permissions_p2_followups.md`).
+A layered, **fully opt-in** permission system for agent tool calls — the application-layer companion to v0.1.96's OS sandbox. When a `permissions:` block is present, every tool call flows through a hardline catastrophic-command floor, a declarative `allow/ask/deny` rule layer, and a blast-radius risk classifier, resolving to allow / **ask** / deny. An `ask` routes through a pluggable approval provider: an automation policy (`risk-based` / `deny-all` / `allow-all`) or a file request/response handshake for headless/remote approval. Approvals are recorded in an append-only audit ledger; per-agent **role presets** (e.g. `read-only`) scope each agent and also empty the SRT writable set as an OS backstop. A channel-based **guardrail system prompt** tells the model to follow blocks and surface-and-ask rather than circumvent — while keeping `ask` a sanctioned path. **Presence-gated**: a config with no `permissions:` block is 100% unchanged. All items landed under TDD (tests first, confirmed red, then green), with live verification across automation runs. **Honest scope**: the prompt + regex classifier are best-effort *alignment*, not enforcement — the OS sandbox (v0.1.96) remains the load-bearing control (see `docs/dev_notes/permissions_p2_followups.md`).
 
 ### Added
 - **Permission engine (opt-in `permissions:` block)**: composite `PreToolUse` pipeline in `massgen/permissions/` — a non-overridable hardline blocklist (`hardline.py`, catastrophic patterns like `rm -rf /`, fork bombs, raw-disk `dd`), a declarative `action(target)` rule layer (`rules.py`: `command`/`read_file`/`write_file`/`read_url`/`mcp`/`*`, **deny-wins** across scopes), and a blast-radius `RiskClassifier` (`risk_classifier.py`: tiers by what the call *does* — egress/force-push/publish/privilege → high, reads/in-workspace edits → low). An explicit rule suppresses the risk-ask, so rules + risk live in one hook.
-- **Approval round-trip**: the `base_with_custom_tool_and_mcp` chokepoint resolves an `ask` via a pluggable `ApprovalProvider` — `CallbackApprovalProvider` → interactive **TUI modal** (`ToolApprovalModal`: allow once/session/always · reject), `PolicyApprovalProvider` → automation default (`risk-based` ships default; high denied with reason, low/medium allowed), and `FileApprovalProvider` → `req_*.json`/`resp_*.json` handshake for headless/remote (fail-closed on timeout).
+- **Approval round-trip**: the `base_with_custom_tool_and_mcp` chokepoint resolves an `ask` via a pluggable `ApprovalProvider` — `PolicyApprovalProvider` → automation default (`risk-based` ships default; high denied with reason, low/medium allowed) and `FileApprovalProvider` → `req_*.json`/`resp_*.json` handshake for headless/remote approval (Slack bot, `/approve <id>`, …). Both are live-verified and fail-closed on timeout.
 - **Per-agent role scoping**: `permissions.role` presets (`read-only`/`researcher` deny writes+shell; `read-write`/`implementer` fall through to rules+risk), merged with user rules deny-wins. A `read-only` role also empties the agent's SRT writable set (OS-layer backstop to the engine's write denials).
 - **Audit ledger + runaway guard (`ledger.py`)**: `ApprovalLedger` writes one append-only JSONL line per approval decision (who/what/why/outcome, crash-safe). `ApprovalBudget` caps consecutive auto-approvals per agent (opt-in `max_consecutive_auto`; fail-closed past the cap, reset by any human decision).
 - **`always`-grant persistence**: an operator's "Always" approval persists as a deduped `allow(...)` rule in `settings.local.json` and loads back as a merged scope next run (opt-out `persist_approvals: false`).
@@ -28,11 +28,11 @@ A layered, **fully opt-in** permission system for agent tool calls — the appli
 - **Backend parity guard**: native backends (`claude_code`, `codex`) don't run the framework `PreToolUse` chokepoint, so a `permissions:` block there is reported **INACTIVE** at startup (loud warning) and inert hooks are skipped — preventing a false promise of enforcement.
 
 ### Tests
-- New deterministic suites: `test_permissions_core.py`, `test_permission_rules.py`, `test_permission_hooks.py`, `test_permission_coordinator.py`, `test_approval_provider.py` / `test_file_approval_provider.py`, `test_approval_ledger.py`, `test_tool_approval_modal.py`, `test_permissions_optional.py` (opt-in/presence gate + parity guard), `test_permission_persistence.py` (write↔load roundtrip + dedup), `test_permission_guardrail_prompt.py` (gating + content incl. ask-is-sanctioned), `test_permission_denied_tool_visibility.py` (start→error-complete events + command preview), plus SRT read-only backstop in `test_srt_manager.py` / `test_srt_filesystem_integration.py`.
+- New deterministic suites: `test_permissions_core.py`, `test_permission_rules.py`, `test_permission_hooks.py`, `test_permission_coordinator.py`, `test_approval_provider.py` / `test_file_approval_provider.py`, `test_approval_ledger.py`, `test_permissions_optional.py` (opt-in/presence gate + parity guard), `test_permission_persistence.py` (write↔load roundtrip + dedup), `test_permission_guardrail_prompt.py` (gating + content incl. ask-is-sanctioned), `test_permission_denied_tool_visibility.py` (start→error-complete events + command preview), plus SRT read-only backstop in `test_srt_manager.py` / `test_srt_filesystem_integration.py`.
 - Live-verified (automation, `gemini-3-flash-preview`): all three chokepoint branches end-to-end (allow / deny-rule / ask→policy-deny + ledger), guardrail policy present in the real system message, denied calls emitting real `tool_start`/`tool_complete(error)` events with the command. Documented honest limitation: the model evaded the regex egress classifier via `\c\u\r\l` / `python urllib`, confirming the OS sandbox is the load-bearing control.
 
 ### Documentations, Configurations and Resources
-- **New Configs**: `massgen/configs/tools/permissions/permission_engine.yaml` (risk-tiered approval + rule algebra), `per_agent_roles.yaml` (role scoping), `permission_modal_interactive.yaml` (interactive approval-modal demo + automation deny path).
+- **New Configs**: `massgen/configs/tools/permissions/permission_engine.yaml` (risk-tiered approval + rule algebra), `per_agent_roles.yaml` (role scoping).
 - **Design Notes**: `docs/dev_notes/permission_systems_research.md` (three-layer model), `docs/dev_notes/permissions_p2_followups.md` (limitations, manual-test gaps, OS-enforcement follow-up).
 
 ## [0.1.96] - 2026-06-10

diff --git a/README.md b/README.md
@@ -161,7 +161,7 @@ This project started with the "threads of thought" and "iterative refinement" id
 
 **What's New in v0.1.97** (Application-Layer Permission Engine):
 - **🛡️ Layered Permission Engine** - Opt-in `permissions:` block routes every tool call through a non-overridable **hardline** floor (`rm -rf /`, fork bombs), declarative **`allow/ask/deny` rules** over a small `action(target)` algebra (deny-wins), and a **blast-radius risk classifier** — auto-allowing reads/in-workspace edits and asking only for the dangerous tail (egress, force-push, publish, privilege). The app-layer companion to v0.1.96's OS sandbox.
-- **✋ Approval That Fits the Run** - An `ask` pops an interactive **modal** (allow once / session / always · reject) when a human is present, or resolves via an automation **policy** (`risk-based` / `deny-all` / `allow-all`) or a **file** request/response handshake for headless/remote approval. Fail-closed by design.
+- **✋ Approval That Fits the Run** - An `ask` resolves via an automation **policy** (`risk-based` / `deny-all` / `allow-all`) or a **file** request/response handshake for headless/remote approval (Slack bot, `/approve <id>`, …) — fail-closed by design.
 - **🧑‍🤝‍🧑 Roles, Audit & Guards** - Per-agent `role` presets (e.g. `read-only`, which also empties the agent's OS-sandbox writable set), an append-only JSONL **audit ledger** of every decision, a runaway-loop **budget**, `always`-grant persistence, and a channel-based **guardrail prompt** that nudges the model to surface blocks rather than circumvent them while keeping `ask` sanctioned. *(Honest scope: the prompt is best-effort alignment; the OS sandbox is the enforcement.)*
 
 **Install v0.1.97:**
@@ -1247,7 +1247,7 @@ MassGen is currently in its foundational stage, with a focus on parallel, asynch
 
 #### Application-Layer Permission Engine
 - **Permission engine (opt-in `permissions:` block)**: a composite `PreToolUse` pipeline in `massgen/permissions/` — a non-overridable **hardline** blocklist (`hardline.py`: `rm -rf /`, fork bombs, raw-disk `dd`), a declarative **`action(target)` rule layer** (`rules.py`: `command`/`read_file`/`write_file`/`read_url`/`mcp`/`*`, deny-wins across scopes), and a **blast-radius `RiskClassifier`** that tiers by what the call does (egress/force-push/publish/privilege → high; reads/in-workspace edits → low). An explicit rule suppresses the risk-ask, so rules + risk live in one hook
-- **Approval round-trip**: the `base_with_custom_tool_and_mcp` chokepoint resolves an `ask` via a pluggable `ApprovalProvider` — interactive **modal** (`ToolApprovalModal`: allow once/session/always · reject), automation **policy** (`risk-based` default / `deny-all` / `allow-all`), or **file** request/response handshake for headless/remote — fail-closed on timeout
+- **Approval round-trip**: the `base_with_custom_tool_and_mcp` chokepoint resolves an `ask` via a pluggable `ApprovalProvider` — automation **policy** (`risk-based` default / `deny-all` / `allow-all`) and **file** request/response handshake for headless/remote (both live-verified, fail-closed on timeout)
 - **Roles, audit & guards**: per-agent `role` presets (`read-only`/`researcher` deny writes+shell, also empties the agent's SRT writable set), an append-only JSONL **`ApprovalLedger`**, a runaway-loop **`ApprovalBudget`**, and `always`-grant persistence to `settings.local.json`
 - **Guardrail system prompt** (`PermissionGuardrailSection`, injected only when the engine is active): follow the guardrails, don't circumvent a denial, surface-and-ask — while keeping `ask` a sanctioned path. Authority is established by channel (only the system prompt is authoritative). Denied tool calls now render as **first-class failed tool events** (with the command) in the TUI/WebUI timeline
 - **Presence-gated & honest**: a config with no `permissions:` block is 100% unchanged; native backends (claude_code/codex) report **INACTIVE** rather than silently inert. All under TDD; live-verified that the prompt is best-effort *alignment* (a model evaded the regex egress classifier via `\c\u\r\l` / `python urllib`), so the OS sandbox remains the load-bearing enforcement

diff --git a/README_PYPI.md b/README_PYPI.md
@@ -160,7 +160,7 @@ This project started with the "threads of thought" and "iterative refinement" id
 
 **What's New in v0.1.97** (Application-Layer Permission Engine):
 - **🛡️ Layered Permission Engine** - Opt-in `permissions:` block routes every tool call through a non-overridable **hardline** floor (`rm -rf /`, fork bombs), declarative **`allow/ask/deny` rules** over a small `action(target)` algebra (deny-wins), and a **blast-radius risk classifier** — auto-allowing reads/in-workspace edits and asking only for the dangerous tail (egress, force-push, publish, privilege). The app-layer companion to v0.1.96's OS sandbox.
-- **✋ Approval That Fits the Run** - An `ask` pops an interactive **modal** (allow once / session / always · reject) when a human is present, or resolves via an automation **policy** (`risk-based` / `deny-all` / `allow-all`) or a **file** request/response handshake for headless/remote approval. Fail-closed by design.
+- **✋ Approval That Fits the Run** - An `ask` resolves via an automation **policy** (`risk-based` / `deny-all` / `allow-all`) or a **file** request/response handshake for headless/remote approval (Slack bot, `/approve <id>`, …) — fail-closed by design.
 - **🧑‍🤝‍🧑 Roles, Audit & Guards** - Per-agent `role` presets (e.g. `read-only`, which also empties the agent's OS-sandbox writable set), an append-only JSONL **audit ledger** of every decision, a runaway-loop **budget**, `always`-grant persistence, and a channel-based **guardrail prompt** that nudges the model to surface blocks rather than circumvent them while keeping `ask` sanctioned. *(Honest scope: the prompt is best-effort alignment; the OS sandbox is the enforcement.)*
 
 **Install v0.1.97:**
@@ -1246,7 +1246,7 @@ MassGen is currently in its foundational stage, with a focus on parallel, asynch
 
 #### Application-Layer Permission Engine
 - **Permission engine (opt-in `permissions:` block)**: a composite `PreToolUse` pipeline in `massgen/permissions/` — a non-overridable **hardline** blocklist (`hardline.py`: `rm -rf /`, fork bombs, raw-disk `dd`), a declarative **`action(target)` rule layer** (`rules.py`: `command`/`read_file`/`write_file`/`read_url`/`mcp`/`*`, deny-wins across scopes), and a **blast-radius `RiskClassifier`** that tiers by what the call does (egress/force-push/publish/privilege → high; reads/in-workspace edits → low). An explicit rule suppresses the risk-ask, so rules + risk live in one hook
-- **Approval round-trip**: the `base_with_custom_tool_and_mcp` chokepoint resolves an `ask` via a pluggable `ApprovalProvider` — interactive **modal** (`ToolApprovalModal`: allow once/session/always · reject), automation **policy** (`risk-based` default / `deny-all` / `allow-all`), or **file** request/response handshake for headless/remote — fail-closed on timeout
+- **Approval round-trip**: the `base_with_custom_tool_and_mcp` chokepoint resolves an `ask` via a pluggable `ApprovalProvider` — automation **policy** (`risk-based` default / `deny-all` / `allow-all`) and **file** request/response handshake for headless/remote (both live-verified, fail-closed on timeout)
 - **Roles, audit & guards**: per-agent `role` presets (`read-only`/`researcher` deny writes+shell, also empties the agent's SRT writable set), an append-only JSONL **`ApprovalLedger`**, a runaway-loop **`ApprovalBudget`**, and `always`-grant persistence to `settings.local.json`
 - **Guardrail system prompt** (`PermissionGuardrailSection`, injected only when the engine is active): follow the guardrails, don't circumvent a denial, surface-and-ask — while keeping `ask` a sanctioned path. Authority is established by channel (only the system prompt is authoritative). Denied tool calls now render as **first-class failed tool events** (with the command) in the TUI/WebUI timeline
 - **Presence-gated & honest**: a config with no `permissions:` block is 100% unchanged; native backends (claude_code/codex) report **INACTIVE** rather than silently inert. All under TDD; live-verified that the prompt is best-effort *alignment* (a model evaded the regex egress classifier via `\c\u\r\l` / `python urllib`), so the OS sandbox remains the load-bearing enforcement

diff --git a/RELEASE_NOTES_v0.1.97.md b/RELEASE_NOTES_v0.1.97.md
@@ -8,7 +8,6 @@
 - **Risk classifier**: tiers a call by blast radius, not name — auto-allows reads and in-workspace edits, asks only for the dangerous tail (network egress, force-push, publish/spend, privilege escalation).
 
 ### ✋ Approval that fits the run
-- **Interactive modal** (`ToolApprovalModal`): allow once / allow session / always · reject, when a human is present.
 - **Automation policy**: `risk-based` (default — high denied with a reason, low/medium allowed), `deny-all`, or `allow-all`.
 - **File handshake** (`FileApprovalProvider`): `req_*.json` / `resp_*.json` for headless/remote approval (Slack bot, `/approve <id>`, …). Fail-closed on timeout throughout.
 
@@ -30,14 +29,6 @@
 
 ### 📖 Getting Started
 - [**Quick Start Guide**](https://github.com/massgen/MassGen?tab=readme-ov-file#1--installation): upgrade and try the permission engine.
-- **Try the approval modal (interactive):**
-
-```bash
-# A high-risk command pops the approval modal (allow once/session/always · reject)
-uv run massgen --config massgen/configs/tools/permissions/permission_modal_interactive.yaml \
-  "Run the shell command: curl -s https://example.com"
-```
-
 - **Try risk-tiered automation (headless deny):**
 
 ```bash

diff --git a/ROADMAP.md b/ROADMAP.md
@@ -54,7 +54,7 @@ Want to contribute or collaborate on a specific track? Reach out to the track ow
 
 ### Features
 - **Permission engine (opt-in `permissions:` block)**: a composite `PreToolUse` pipeline in `massgen/permissions/` — a non-overridable **hardline** blocklist (`hardline.py`: catastrophic patterns like `rm -rf /`, fork bombs, raw-disk `dd`), a declarative **`action(target)` rule layer** (`rules.py`: `command`/`read_file`/`write_file`/`read_url`/`mcp`/`*`, deny-wins across scopes), and a **blast-radius `RiskClassifier`** that tiers by what the call does (egress/force-push/publish/privilege → high; reads/in-workspace edits → low). An explicit rule suppresses the risk-ask, so rules + risk live in one hook
-- **Approval round-trip**: the `base_with_custom_tool_and_mcp` chokepoint resolves an `ask` via a pluggable `ApprovalProvider` — interactive **modal** (`ToolApprovalModal`: allow once/session/always · reject), automation **policy** (`risk-based` default / `deny-all` / `allow-all`), or **file** request/response handshake for headless/remote (fail-closed on timeout)
+- **Approval round-trip**: the `base_with_custom_tool_and_mcp` chokepoint resolves an `ask` via a pluggable `ApprovalProvider` — automation **policy** (`risk-based` default / `deny-all` / `allow-all`) or **file** request/response handshake for headless/remote (fail-closed on timeout)
 - **Roles, audit & guards**: per-agent `role` presets (`read-only`/`researcher` deny writes+shell, also empty the agent's SRT writable set), an append-only JSONL **`ApprovalLedger`**, a runaway-loop **`ApprovalBudget`** (opt-in `max_consecutive_auto`), and `always`-grant persistence to `settings.local.json`
 - **Channel-based guardrail system prompt** (`PermissionGuardrailSection`, injected only when the engine is active): follow the guardrails, don't circumvent a denial, surface-and-ask — while keeping `ask` a sanctioned path. Denied tool calls now render as **first-class failed tool events** (with the command) in the TUI/WebUI timeline
 - **Backend parity guard**: native backends (`claude_code`, `codex`) lack the framework chokepoint, so a `permissions:` block there is reported **INACTIVE** instead of silently inert