Skip to content

feat: OS-level SRT agent sandboxing + permission-hook hardening#1125

Merged
Henry-811 merged 6 commits into
dev/v0.1.96from
feat/better-sandboxing
Jun 10, 2026
Merged

feat: OS-level SRT agent sandboxing + permission-hook hardening#1125
Henry-811 merged 6 commits into
dev/v0.1.96from
feat/better-sandboxing

Conversation

@ncrispino

@ncrispino ncrispino commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds OS-level execution sandboxing for agents via Anthropic's sandbox-runtime (srt: bubblewrap on Linux, Seatbelt on macOS), and hardens the existing application-layer permission hook against file-tool sandbox escapes. Default-off, one-knob opt-in (command_line_execution_mode: srt); current behavior is unchanged unless a config turns it on.

Defense in depth, by design: the OS layer (SrtManager) and the app layer (PathPermissionManager) are derived from the same path policy and both stay active. SRT closes the shell escape hatch (e.g. echo x > /etc/passwd); the hardened hook closes file-tool escapes (write_file/move/copy to/from outside the workspace).

This PR also doubles as the v0.1.96 release (release-prep docs included).

What's included

1. SRT sandbox mode (command_line_execution_mode: srt)

  • SrtManager derives per-agent SRT settings from PathPermissionManager.managed_paths: allowWrite for writable paths, denyWrite for read-only/protected, network deny-all by default (allowlist is an opt-in capability grant), and a configurable read-confinement policy (command_line_srt_read_mode, default confined).
  • Command-line MCP + filesystem-tools MCP servers are OS-wrapped (srt --settings cfg sh -c '<cmd>'); npx/npm launchers + the no-roots wrapper auto-skip and keep their app-layer protection.
  • Native-sandbox backends degrade srtlocal (codex --full-auto, claude_code) to avoid nested-sandbox hangs.
  • Subagents inherit the parent's command_line_srt_* settings (parity with Docker).

2. Permission-hook hardening (PathPermissionManager)

  • New _validate_no_path_arg_escapes: a key-agnostic scan over the full tool-args tree (nested dicts + lists) that denies any value resolving outside managed areas — closing the prior fail-open behavior (unrecognized key, list-valued path, move/copy source pointing outside) with no false positives.

3. Fix found by live smoke-testing (this PR)

  • The OS-wrapped workspace_tools MCP server failed to start under the default confined read mode: in a normal dev/editable install the venv, the massgen package source, and git's user config all live under $HOME (which confined denies), so srt denied reading the server's own code (then GitPython's git version reading ~/.gitconfig). The server never connected and the agent silently fell back to shell-only (fail-closed — security held — but the filesystem MCP file-op tools were unavailable whenever srt was on).
  • Fix: build_settings() re-allows the framework's runtime read roots for the fs_tools profile only (sys.prefix/sys.base_prefix, the massgen package dir, ~/.gitconfig + ~/.config/git). Framework code/config, not user data — secrets and other projects stay denied; the agent's own execution profile is untouched.

Tests

File Covers
test_srt_manager.py settings derivation, profiles, secret read-deny baseline, protected-path read+write deny, wrapping, availability guards, framework-read-roots re-allow under confined (+ execution profile stays tight)
test_srt_filesystem_integration.py command-line + fs-tools config wiring, sh -c wrap, npx / no-roots auto-skip, MCP-security validation
test_srt_backend_degrade.py srtlocal degrade for native-sandbox backends; API backends keep srt
test_path_permission_hook_adversarial.py 15 escape vectors + false-positive guards
test_subagent_manager.py::TestSrtSettingsInheritance subagent inherits parent srt settings

Live verification (macOS 15.7, srt 1.0.0)

Ran the committed srt_sandbox.yaml demo end-to-end:

  • Workspace write succeeds; out-of-workspace read (~/srt_secret_test.txt) → Operation not permitted; curl egress → denied (CONNECT tunnel failed, 403).
  • After the fix: agent creates files via the filesystem MCP tools (mcp__filesystem__write_file, not shell) while the $HOME secret read stays OS-denied — workspace_tools connects with 0 failures.

Config used to test

massgen/configs/tools/filesystem/sandbox/srt_sandbox.yaml

Known follow-ups (not in this PR)

  • write_file/edit_file via the npx filesystem server is app-layer-only; full OS coverage needs a globally-installed (non-npx) server.
  • Network-egress MITM / per-agent credential scoping.
  • claude_code native-sandbox lever via ClaudeAgentOptions.

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes: v0.1.96

  • New Features

    • OS-level sandboxing for agent command execution with configurable network and read-access policies.
    • Enhanced path validation to prevent escape attempts in file operations.
  • Documentation

    • Updated release notes, installation guides, and sandbox configuration examples.

ncrispino and others added 5 commits June 9, 2026 21:29
…ecution_mode: srt)

Adds Anthropic sandbox-runtime (srt) as a 3rd command_line_execution_mode
alongside local/docker. Default-off; opt-in via config. OS-level filesystem +
network isolation derived from the same PathPermissionManager policy as the app
layer (defense in depth).

- SrtManager derives per-agent settings (allowWrite/denyWrite/denyRead, deny-all
  network, built-in secret-store read-deny baseline) and wraps commands via
  'srt --settings cfg sh -c <cmd>' (sh -c form required so srt does not consume
  the server's -- separator).
- Command-line MCP + filesystem-tools MCP servers are OS-wrapped; npx/npm and the
  no-roots wrapper auto-skip (registry/cache writes the sandbox blocks) and keep
  their app-layer protection.
- Native-sandbox backends (codex --full-auto, claude_code) degrade srt->local via
  has_native_execution_sandbox() to avoid nested-sandbox hangs; stored config
  normalized so raw reads see local.
- Subagents inherit parent command_line_srt_* settings (parity with Docker).
- New command_line_srt_* params added to the single-source exclusion list; 'srt'
  added to the MCP executable allowlist. Example config + tests included.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…andbox escapes

Adds a key-agnostic escape scan (_validate_no_path_arg_escapes) that walks the
full tool-args tree (nested dicts + lists) and denies any value resolving outside
all managed areas. Closes fail-open gaps surfaced by an adversarial audit:

- path under an unrecognized arg key (e.g. output_path/dst) bypassed the boundary
- list-valued and nested-dict path args were never checked
- move/copy 'source' pointing outside (delete-external / exfiltrate-external)

No false positives: non-path strings resolve harmlessly inside the workspace, and
content-bearing keys are skipped. Symlinks/.. were already handled via .resolve().
15-vector adversarial test suite added.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…o workspace+context)

SRT reads are allow-all by default, so the prior secret-denylist still left the
whole filesystem readable. Add command_line_srt_read_mode:
  - confined (default): denyRead=$HOME, allowRead=workspace+context+temp; system
    paths outside $HOME stay readable so commands still run. Denies personal data,
    secrets, and other projects.
  - strict: denyRead='/', allowRead=managed + system runtime baseline + extras.
  - open: allow-all reads minus a built-in secret denylist + extras.
Plus command_line_srt_allow_read to widen the allow-list per config. Both params
wired through base.py (+validation), the single-source exclusion list, and
FilesystemManager. Note: allowRead wins over denyRead in SRT, so protected
sub-paths inside an allowed context are read-deniable only in 'open' mode (their
write-immunity still holds everywhere). Live-verified: confined denies $HOME
reads while python/system reads and workspace I/O keep working.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…arts under confined SRT

Live-smoke-testing the srt_sandbox.yaml demo surfaced that the OS-wrapped
workspace_tools MCP server failed to start under the default confined read mode:
SRT denied reading the server's own code because, in a normal dev/editable
install, the venv (fastmcp + deps + interpreter), the massgen package source, and
git's user config all live under $HOME — exactly the region confined denies. First
the server's own _workspace_tools_server.py script was unreadable; after re-allowing
the code roots, GitPython's import-time `git version` then failed reading
~/.gitconfig. Either way the server never connected and the agent silently fell back
to shell-only (security held — fail-closed — but the filesystem MCP file-op tools
were unavailable whenever srt was on).

Fix: build_settings() now re-allows the framework's runtime read roots for the
fs_tools profile only (sys.prefix/sys.base_prefix, the massgen package dir, and
~/.gitconfig + ~/.config/git). This is framework code/config, not user data, so
secrets (.ssh/.aws/...) and other projects stay denied; the agent's own execution
profile is untouched.

The confined read mode landed in a later commit than the fs-tools wrapping, and the
unit tests only checked argv/wiring shape — not a live server handshake — so this
slipped through. TDD: added test_fs_tools_profile_confined_allows_reading_framework_runtime
(red first) asserting the fs_tools allowRead covers the framework roots + git config
while $HOME stays in denyRead, plus test_execution_profile_does_not_widen_for_framework_runtime
to keep the agent sandbox tight. Verified live: the agent now creates files via the
filesystem MCP tools while a shell read of a $HOME secret is still OS-denied.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Release-prep doc sweep for v0.1.96 (the SRT sandboxing release):
- CHANGELOG: full v0.1.96 entry (theme, SRT mode, read confinement, hardened
  permission hook, native-backend degrade, tests, framework-read-roots fix).
- README: Latest Features + Recent Achievements rewritten for v0.1.96; v0.1.95
  moved to Previous Achievements; TOC anchors + bottom roadmap shifted.
- ROADMAP: current version → v0.1.96, new completed section; planned multimodal
  image/video edit work shifts to v0.1.97 (ROADMAP_v0.1.97.md content).
- docs/source/index.rst + massgen/configs/README.md: v0.1.96 release entries.
- Announcement: new concise current-release.md (no link in the posted body,
  pending social links). v0.1.95 announcement was archived in the prior commit.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 10, 2026 16:19
@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown

Review Change Stack

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9847168e-8c8c-4a7d-8b55-46c27f87d63d

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR introduces OS-level execution sandboxing via Anthropic's sandbox-runtime (SRT) as an opt-in command_line_execution_mode: srt for MCP command-line agents, alongside a defense-in-depth hardening of path permission validation. Native-sandbox backends degrade SRT to local mode. SRT managers wrap commands and build per-profile sandbox settings derived from managed path permissions.

Changes

OS-Level Agent Sandboxing (SRT)

Layer / File(s) Summary
Documentation and release notes
CHANGELOG.md, PR_DRAFT_sandboxing.md, README.md, README_PYPI.md, ROADMAP.md, ROADMAP_v0.1.97.md, docs/announcements/*, docs/source/index.rst, massgen/configs/README.md
Release notes document v0.1.96 OS-level sandboxing, configurable read-confinement modes, hardened permission hooks, native-sandbox degradation behavior, comprehensive test coverage, and macOS verification results. README and roadmap updated to reflect v0.1.96 as latest and v0.1.97 roadmap.
Backend SRT mode initialization and native sandbox declarations
massgen/backend/base.py, massgen/backend/_excluded_params.py, massgen/backend/codex.py, massgen/backend/claude_code.py
Backend base class validates command_line_execution_mode: srt and introduces has_native_execution_sandbox() method; when native sandbox support is present, SRT mode is degraded to local with warning. Codex and Claude Code backends override to declare native sandbox support. SRT configuration parameters are added to exclusion list so they are not forwarded to provider APIs.
SRT core engine (SrtManager)
massgen/filesystem_manager/_srt_manager.py
New module implements SRT sandboxing: detects SRT CLI availability, wraps shell commands and argv for SRT sandbox execution, builds per-profile JSON settings with filesystem allow/deny lists and configurable read-confinement modes (confined, strict, open), and computes framework runtime read allowances for MCP server initialization within the sandbox.
MCP server and code execution SRT integration
massgen/filesystem_manager/_filesystem_manager.py, massgen/filesystem_manager/_code_execution_server.py
FilesystemManager wires SRT configuration, instantiates SrtManager, wraps filesystem and workspace-tools MCP servers with SRT command wrapping while defensively skipping network-dependent launchers (npx/npm), and injects --srt-settings path into command execution server args. CodeExecutionServer accepts SRT mode, validates platform/binary availability at startup, and wraps commands via SRT sandbox before subprocess execution with original command passed via sh -c.
Path permission hardening with escape validation
massgen/filesystem_manager/_path_permission_manager.py
Adds _validate_no_path_arg_escapes() helper that walks the full tool-argument tree (including nested dicts/lists), skips content-bearing keys, resolves candidate strings against workspace, and denies any argument value outside managed areas—defense-in-depth against fail-open escapes via unrecognized keys, list-valued paths, and move/copy operations outside sandbox.
Security allowlisting and subagent SRT inheritance
massgen/mcp_tools/security.py, massgen/subagent/manager.py
SRT binary added to default allowed MCP executables with documentation of opt-in trusted-wrapper usage. SubagentManager propagates SRT-related backend configuration fields to generated subagent YAML, matching Docker inheritance behavior.
Backend degradation and path hardening test suites
massgen/tests/test_srt_backend_degrade.py, massgen/tests/test_path_permission_hook_adversarial.py
Backend degradation tests verify command_line_execution_mode: srt is preserved for API backends but degraded to local for native-sandbox backends. Adversarial path tests exercise baseline escapes (absolute paths, .. traversal, symlinks), fail-open gaps (unrecognized keys, list paths, move/copy sources), false-positive protection, and nested structure escapes.
SRT component unit and integration tests
massgen/tests/test_srt_manager.py, massgen/tests/test_srt_filesystem_integration.py, massgen/tests/test_subagent_manager.py
SrtManager unit tests validate filesystem settings for execution and fs_tools profiles, read-confinement modes, network configuration, command wrapping, and binary availability. Filesystem integration tests verify MCP CLI arg injection, settings generation, server wrapping with sh -c form preservation, and launcher-skip rules. Subagent tests confirm SRT settings inheritance.
Example SRT configuration
massgen/configs/tools/filesystem/sandbox/srt_sandbox.yaml
YAML configuration demonstrates opt-in SRT sandboxing, documents isolation properties (workspace write allowed, external reads/network denied by default), and shows optional read-confinement, network allowlist, and deny-read controls.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

  • a5507203
🚥 Pre-merge checks | ✅ 6 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 31.25% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Documentation Updated ⚠️ Warning Missing documentation: (1) user_guide/ RST updates, (2) yaml_schema.rst with srt params, (3) design doc in dev_notes/, (4) 42 test functions lack Google-style docstrings. Add user guide RST, update yaml_schema.rst for srt mode/params, create design doc, add docstrings to all 42 test functions per guidelines.
✅ Passed checks (6 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'feat: OS-level SRT agent sandboxing + permission-hook hardening' clearly describes the main changes: addition of OS-level sandboxing and security hardening.
Description check ✅ Passed The PR description is comprehensive and well-structured. It includes a clear summary, detailed explanation of included features, test coverage, live verification results, configuration details, and known follow-ups.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Capabilities Registry Check ✅ Passed PR adds OS-level SRT sandboxing to existing backends. No new backends or models introduced. capabilities.py and token_manager.py unchanged; check not applicable.
Config Parameter Sync ✅ Passed Both sync points import from the same centralized source (_excluded_params.py), ensuring all 5 new SRT parameters are automatically synchronized across both files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/better-sandboxing

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ncrispino ncrispino changed the title feat(sandbox): OS-level SRT agent sandboxing + permission-hook hardening (v0.1.96) feat: OS-level SRT agent sandboxing + permission-hook hardening Jun 10, 2026
@ncrispino ncrispino changed the base branch from main to dev/v0.1.96 June 10, 2026 16:20

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an opt-in OS-level sandboxing mode for agent command execution and MCP filesystem servers using Anthropic’s sandbox-runtime (srt), and hardens the application-layer PathPermissionManager hook to prevent path-argument escape vectors. It also updates release docs/materials for v0.1.96.

Changes:

  • Add command_line_execution_mode: srt with SrtManager to derive SRT filesystem/network settings from PathPermissionManager, wrap command execution and (where possible) filesystem MCP servers, and degrade to local for native-sandbox backends.
  • Harden the permission hook with a key-agnostic recursive scan of tool args to deny any path resolving outside managed areas.
  • Release prep/documentation updates for v0.1.96 (roadmaps, READMEs, docs announcements, changelog) plus new test suites covering SRT and adversarial permission-hook vectors.

Reviewed changes

Copilot reviewed 26 out of 26 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
ROADMAP.md Bumps current version/date and records v0.1.96 completion; defers v0.1.97 items.
ROADMAP_v0.1.97.md Updates roadmap version text and references v0.1.96 track.
README.md Updates “Latest Features” and roadmap pointers to v0.1.96/v0.1.97; bumps install version.
README_PYPI.md Mirrors README release/version updates for PyPI rendering.
PR_DRAFT_sandboxing.md Adds a PR draft description of SRT sandboxing + permission hook hardening.
massgen/tests/test_subagent_manager.py Adds test ensuring subagents inherit selected SRT settings.
massgen/tests/test_srt_manager.py Adds unit tests for SrtManager settings derivation and wrapping helpers.
massgen/tests/test_srt_filesystem_integration.py Adds integration-ish tests for FilesystemManager ↔ SRT config wiring and MCP security allowlist.
massgen/tests/test_srt_backend_degrade.py Adds tests for srtlocal degradation on native-sandbox backends.
massgen/tests/test_path_permission_hook_adversarial.py Adds adversarial tests for tool-arg path escape vectors and false-positive guards.
massgen/subagent/manager.py Implements inheritance of (some) SRT backend settings from parent to subagents.
massgen/mcp_tools/security.py Adds srt to the default allowed MCP executable allowlist.
massgen/filesystem_manager/_srt_manager.py Introduces SrtManager + helpers for building settings and wrapping commands/argv with SRT.
massgen/filesystem_manager/_path_permission_manager.py Adds _validate_no_path_arg_escapes and calls it from existing validation paths.
massgen/filesystem_manager/_filesystem_manager.py Wires SrtManager into FilesystemManager and adds SRT wrapping for stdio MCP server configs.
massgen/filesystem_manager/_code_execution_server.py Adds srt execution mode and --srt-settings handling to the command execution MCP server.
massgen/configs/tools/filesystem/sandbox/srt_sandbox.yaml Adds an example config for enabling and demoing SRT sandbox mode.
massgen/configs/README.md Updates release history section to v0.1.96 and adds SRT usage snippet.
massgen/backend/codex.py Declares Codex has a native execution sandbox (for srt degradation).
massgen/backend/claude_code.py Declares Claude Code has a native execution sandbox (for srt degradation).
massgen/backend/base.py Adds srt execution mode validation, degradation logic, and plumbs SRT parameters into FilesystemManager.
massgen/backend/_excluded_params.py Excludes SRT config params from backend param forwarding where appropriate.
docs/source/index.rst Adds v0.1.96 release summary to Sphinx index.
docs/announcements/current-release.md Updates current release announcement content to v0.1.96.
docs/announcements/archive/v0.1.95.md Archives the v0.1.95 release announcement.
CHANGELOG.md Adds v0.1.96 changelog entry and release details.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +234 to +246
home = str(Path.home())
managed_readable = [str(mp.path) for mp in managed]
allow_read = managed_readable + [str(p) for p in self.fs_tools_extra_writable] + list(self.allow_read)

# The fs-tools profile wraps a FRAMEWORK MCP server (fastmcp run <massgen
# script>). Under confined/strict the server's own code + interpreter + deps
# live in a denied region ($HOME), so re-allow the framework runtime roots or
# the wrapped server can't read its own script and fails to start. This is
# framework code, not user data; the agent's own command sandbox (execution
# profile) stays tight and is unaffected. ("open" re-allows nothing because
# it is allow-all-minus-denylist; the framework roots are readable anyway.)
if profile == FS_TOOLS_PROFILE:
allow_read = allow_read + _framework_read_roots()
Comment on lines +286 to +289
target_dir = self.settings_dir or Path.cwd()
target_dir.mkdir(parents=True, exist_ok=True)
suffix = f"{agent_id}-" if agent_id else ""
path = target_dir / f"srt-settings-{suffix}{profile}.json"
Comment on lines +1927 to +1931
srt_settings = [
"command_line_srt_network_allowed_domains",
"command_line_srt_deny_read",
"command_line_srt_allow_unix_sockets",
]
Comment on lines +1481 to +1494
for key, cand in walk(tool_args, None):
if not cand:
continue
if "\x00" in cand:
return (False, f"Access denied: argument '{key}' contains a null byte.")
try:
resolved = Path(self._resolve_path_against_workspace(cand)).resolve()
except (OSError, ValueError):
continue
if not self._is_path_within_any_managed_area(resolved):
return (
False,
f"Access denied: argument '{key}'='{cand}' resolves to '{resolved}', outside allowed directories.",
)
Comment thread CHANGELOG.md
- **Permission-hook hardening (`PathPermissionManager`)**: new `_validate_no_path_arg_escapes` — a key-agnostic scan that walks the full tool-args tree (nested dicts + lists) and denies any value resolving outside all managed areas. Closes the prior **fail-open** behavior (path under an unrecognized key, list-valued path, or `move`/`copy` `source` pointing outside the workspace) without false positives (non-path strings resolve harmlessly inside the workspace; content keys are skipped). Symlinks/`..` were already handled by `.resolve()`.

### Tests
- New deterministic suites: `test_srt_manager.py` (settings derivation, profiles, secret read-deny baseline, protected-path read+write deny, wrapping, availability guards), `test_srt_filesystem_integration.py` (command-line + fs-tools config wiring, `sh -c` wrap, npx / no-roots auto-skip, MCP-security validation), `test_srt_backend_degrade.py` (`srt`→`local` degrade for native-sandbox backends; API backends keep `srt`), `test_path_permission_hook_adversarial.py` (15 escape vectors — absolute/`..`/symlink/unrecognized-key/list/nested-dict/move-source/copy-source/read-exfil — plus false-positive guards), and `test_subagent_manager.py::TestSrtSettingsInheritance` (subagent inherits parent SRT settings).

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
ROADMAP.md (1)

1212-1212: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Resolve conflicting “Last Updated” dates in the same roadmap.

Line 1212 still says April 17, 2026, which conflicts with Line 7 (June 10, 2026). Keep one authoritative date to avoid release metadata drift.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ROADMAP.md` at line 1212, Update the duplicate "Last Updated" metadata so
there's a single authoritative date: search for the "Last Updated:" string and
change the instance that reads "April 17, 2026" to match the canonical "June 10,
2026" (or vice versa depending on the intended authoritative date), ensuring
only one "Last Updated:" entry remains to prevent metadata drift.
massgen/tests/test_srt_filesystem_integration.py (1)

18-159: 🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Bring new fixtures/tests in line with Google-style docstring requirements.

The new/changed functions in this file are missing Google-style docstrings.

As per coding guidelines, **/*.py: “For new or changed functions, include Google-style docstrings”.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@massgen/tests/test_srt_filesystem_integration.py` around lines 18 - 159, The
tests and fixtures lack Google-style docstrings; add a one-paragraph
Google-style docstring to each new/changed function (fixtures srt_fs_manager,
local_fs_manager and all test_* functions in this file such as
test_command_line_config_has_srt_args,
test_command_line_config_writes_valid_settings_file,
test_local_mode_has_no_srt_args, test_fs_tools_server_is_srt_wrapped_via_sh_c,
test_fs_tools_server_not_wrapped_in_local_mode,
test_srt_is_an_allowed_mcp_executable,
test_srt_wrapped_fs_tools_config_passes_mcp_security,
test_wrap_skips_npx_launcher, test_wrap_applies_to_non_network_launcher,
test_wrap_skips_no_roots_wrapper, test_wrap_skips_absolute_path_npx) that
briefly states the function purpose, key behavior/assertions, and for fixtures
include Args (tmp_path) and Returns (FilesystemManager) where applicable; keep
them concise and follow Google style (one-line summary, optional blank line, and
short Args/Returns sections).

Source: Coding guidelines

🧹 Nitpick comments (9)
massgen/configs/tools/filesystem/sandbox/srt_sandbox.yaml (1)

27-27: 💤 Low value

Consider using a more cost-effective model per coding guidelines.

The coding guidelines recommend preferring cost-effective models like gpt-5-nano, gpt-5-mini, or gemini-2.5-flash for example configurations. Using "gpt-5" may default to a more expensive variant.

💡 Suggested change
-      model: "gpt-5"
+      model: "gpt-5-nano"

As per coding guidelines: "Prefer cost-effective models (gpt-5-nano, gpt-5-mini, gemini-2.5-flash)" from massgen/configs/**/*.yaml section.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@massgen/configs/tools/filesystem/sandbox/srt_sandbox.yaml` at line 27, The
config sets model: "gpt-5", which is not cost‑effective per project guidelines;
update the model value in the sandbox config (look for the YAML key model in
srt_sandbox.yaml) to a recommended cheaper option such as "gpt-5-nano",
"gpt-5-mini", or "gemini-2.5-flash" so the example uses a cost-effective model
variant.

Source: Coding guidelines

massgen/filesystem_manager/_path_permission_manager.py (2)

1452-1495: ⚡ Quick win

Add Google-style docstring sections.

The docstring should include Args and Returns sections per coding guidelines. As per coding guidelines, new or changed functions in **/*.py should include Google-style docstrings.

📝 Suggested docstring format
 def _validate_no_path_arg_escapes(self, tool_args: dict[str, Any]) -> tuple[bool, str | None]:
     """Defense in depth: deny if ANY argument value resolves outside allowed dirs.
 
     Closes the fail-open gap in the primary extractor: a path under an
     unrecognized key, inside a list, or in a move/copy ``source`` (which the
     extractor deliberately skips) would otherwise bypass the boundary check.
 
     Safe against false positives: non-path strings and relative values resolve
     harmlessly *inside* the workspace, so only genuine absolute-outside /
     ``..``-escape / symlink-escape values are denied. Content-bearing keys are
     skipped so file content that merely mentions a path isn't flagged.
 
     Walks the FULL argument tree (nested dicts and lists), so a path buried under
     e.g. ``{"opts": {"path": "/etc/passwd"}}`` or ``{"items": [{"path": ...}]}``
     is caught, not just top-level keys.
+
+    Args:
+        tool_args: Tool arguments dictionary to validate.
+
+    Returns:
+        Tuple of (allowed: bool, reason: Optional[str]). Returns (False, reason)
+        if any argument resolves outside managed areas, (True, None) otherwise.
     """
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@massgen/filesystem_manager/_path_permission_manager.py` around lines 1452 -
1495, Add Google-style docstring sections to the _validate_no_path_arg_escapes
method: update its docstring to include an "Args" section describing the
tool_args parameter (type: dict[str, Any], brief purpose) and a "Returns"
section describing the returned tuple (tuple[bool, str | None], meaning of
True/False and the optional error message). Keep the existing descriptive text
but append these sections in standard Google format so linters and readers
understand the parameter and return contract.

Source: Coding guidelines


1437-1450: ⚡ Quick win

Add Google-style docstring sections.

The docstring should include Args and Returns sections per coding guidelines. As per coding guidelines, new or changed functions in **/*.py should include Google-style docstrings.

📝 Suggested docstring format
 def _is_path_within_any_managed_area(self, path: Path) -> bool:
-    """Like ``_is_path_within_allowed_directories`` but ALSO counts
-    ``file_context_parent`` dirs as inside.
-
-    Used by the escape scan so it only flags paths that are TRULY outside every
-    managed area; finer-grained denials (e.g. a sibling file inside a
-    file-context directory) are left to the downstream permission logic, which
-    gives a more specific reason.
-    """
+    """Like ``_is_path_within_allowed_directories`` but ALSO counts
+    ``file_context_parent`` dirs as inside.
+
+    Used by the escape scan so it only flags paths that are TRULY outside every
+    managed area; finer-grained denials (e.g. a sibling file inside a
+    file-context directory) are left to the downstream permission logic, which
+    gives a more specific reason.
+
+    Args:
+        path: Path to check against managed areas.
+
+    Returns:
+        True if path is within any managed area (including file_context_parent), False otherwise.
+    """
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@massgen/filesystem_manager/_path_permission_manager.py` around lines 1437 -
1450, The _is_path_within_any_managed_area method is missing Google-style
docstring sections; update its docstring to include Args and Returns sections
describing the path parameter (type Path) and the boolean return value, keep the
current explanatory text, and mention that path is resolved (e.g., "path: Path —
path to check, will be resolved") and that the function returns True if any
managed_path in self.managed_paths contains or equals the resolved path
(reference managed_paths, managed_path.contains, managed_path.path, and
path.resolve).

Source: Coding guidelines

massgen/tests/test_srt_backend_degrade.py (1)

57-64: ⚖️ Poor tradeoff

Consider making has_native_execution_sandbox a classmethod.

The test uses object.__new__(CodexBackend) to avoid heavy backend construction, which is documented but fragile. If has_native_execution_sandbox ever accesses instance state, this will break. Consider making it a classmethod since it returns a constant.

Alternative approach

If made a classmethod in the backend base:

`@classmethod`
def has_native_execution_sandbox(cls) -> bool:
    """Whether this backend sandboxes its own command execution natively."""
    return False

Then the test could be simplified to:

assert CodexBackend.has_native_execution_sandbox() is True
assert ClaudeCodeBackend.has_native_execution_sandbox() is True

This would be more maintainable and explicit about the constant nature of the return value.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@massgen/tests/test_srt_backend_degrade.py` around lines 57 - 64, The test is
using object.__new__ to call has_native_execution_sandbox, which indicates the
method is effectively static/constant and should be a classmethod; change the
method definition of has_native_execution_sandbox in the backend base class (and
any overrides in CodexBackend and ClaudeCodeBackend) to be decorated with
`@classmethod` and accept cls instead of self, returning the same boolean
constant, then update the test to call
CodexBackend.has_native_execution_sandbox() and
ClaudeCodeBackend.has_native_execution_sandbox() directly.
massgen/backend/claude_code.py (1)

354-356: ⚡ Quick win

Match the Python docstring convention on this new override.

Add a Google-style docstring with a Returns: section here as well, so the backend override documents the same contract as the base hook. As per coding guidelines, **/*.py: For new or changed functions, include Google-style docstrings.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@massgen/backend/claude_code.py` around lines 354 - 356, The override
has_native_execution_sandbox currently uses a short triple-quoted string;
replace it with a Google-style docstring that documents the behavior and
includes a Returns: section matching the base hook contract — update the
has_native_execution_sandbox(self) -> bool method docstring to a full
Google-style docstring describing that Claude Code confines execution via an
OS-level sandbox and add a Returns: bool description stating it returns True
when native sandboxing is provided.

Source: Coding guidelines

massgen/backend/base.py (1)

699-710: ⚡ Quick win

Use a Google-style docstring for this new extension point.

Please document the return contract with a Google-style Returns: section so subclasses have a consistent definition of when to return True. As per coding guidelines, **/*.py: For new or changed functions, include Google-style docstrings.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@massgen/backend/base.py` around lines 699 - 710, Update the docstring for the
has_native_execution_sandbox method to a Google-style docstring: keep the
existing summary, then add a "Returns:" section that explicitly states the
method returns a bool and clarifies the contract (True when the backend performs
its own process-level command execution sandboxing such that MassGen's
SRT/OS-level sandbox must be disabled/degraded to local; False otherwise).
Ensure the docstring remains on the has_native_execution_sandbox method and that
subclasses can rely on this definition to decide whether to return True.

Source: Coding guidelines

massgen/filesystem_manager/_srt_manager.py (1)

111-130: ⚡ Quick win

Use Google-style docstrings for the new SRT API surface.

The new helper functions and SrtManager methods are documented, but not in the repo’s required Google style. Please add Args: / Returns: sections where applicable so these new contracts are consistent and discoverable. As per coding guidelines, "For new or changed functions, include Google-style docstrings".

Also applies to: 168-307

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@massgen/filesystem_manager/_srt_manager.py` around lines 111 - 130, Update
the docstrings for the new SRT API functions and SrtManager methods to
Google-style: for each top-level function (wrap_command_with_srt,
wrap_argv_with_srt, srt_available) and every public method in SrtManager (lines
~168-307), replace/augment the existing docstrings with Google-style sections
including a one-line summary, an Args: section describing parameters (name,
type, brief description), and a Returns: section describing the return type and
meaning; ensure types follow the repo convention (e.g., str | Path, list[str],
bool) and include any side effects or notes where applicable so the functions’
contracts are consistent with the project guidelines.

Source: Coding guidelines

massgen/filesystem_manager/_filesystem_manager.py (1)

174-178: ⚡ Quick win

Document the new SRT constructor parameters.

FilesystemManager.__init__() now exposes command_line_execution_mode="srt" plus five command_line_srt_* arguments, but the Args: block still documents only local/docker execution and omits the new knobs entirely. Please update this constructor docstring so the public API stays accurate and Google-style. As per coding guidelines, "For new or changed functions, include Google-style docstrings".

Also applies to: 196-247

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@massgen/filesystem_manager/_filesystem_manager.py` around lines 174 - 178,
Update the Google-style docstring for FilesystemManager.__init__ to document the
new SRT-related parameters: add an Args: section (or extend the existing one)
that lists command_line_execution_mode (str, e.g. "local"|"docker"|"srt"), and
each new argument command_line_srt_network_allowed_domains (list[str]|None),
command_line_srt_deny_read (list[str]|None), command_line_srt_allow_unix_sockets
(list[str]|None), command_line_srt_read_mode (str, default "confined"), and
command_line_srt_allow_read (list[str]|None) with their types, defaults, and
brief purpose/behavior; also mention how these are used when
command_line_execution_mode="srt". Apply the same docstring updates to the other
changed constructor overload/variant in the same file (the second __init__ block
covering the other lines) so the public API docs are consistent.

Source: Coding guidelines

massgen/filesystem_manager/_code_execution_server.py (1)

217-218: ⚡ Quick win

Use a Google-style docstring for create_server.

As per coding guidelines, “**/*.py: For new or changed functions, include Google-style docstrings”.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@massgen/filesystem_manager/_code_execution_server.py` around lines 217 - 218,
The current docstring for create_server is a one-line triple-quoted summary;
replace it with a Google-style docstring for the async factory create_server()
that includes a short summary, an "Returns:" section documenting the returned
fastmcp.FastMCP instance, and a "Raises:" section for any exceptions the factory
may propagate (e.g., connection or setup errors). Update the docstring for the
create_server function to follow Google style (summary, blank line, detailed
sections like Returns and Raises) while keeping the description concise and
accurate.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@CHANGELOG.md`:
- Around line 10-33: The 0.1.96 changelog block is missing the required release
metadata: add a "### Technical Details" subsection under the "## [0.1.96] -
2026-06-10" header that concisely summarizes implementation/architecture notes
(SrtManager integration, PathPermissionManager _validate_no_path_arg_escapes,
command_line_srt_* config semantics, native-sandbox degrade behavior, test-suite
names like test_srt_manager.py/test_path_permission_hook_adversarial.py) and
include an explicit issue/PR reference line (e.g. "Fixes #<issue-number>, PR
#<pr-number>") using the actual issue/PR IDs for traceability; ensure the
symbols SrtManager and PathPermissionManager are mentioned so reviewers can map
the changelog to the diff.

In `@docs/announcements/archive/v0.1.95.md`:
- Around line 4-6: Update the header comment in
docs/announcements/archive/v0.1.95.md that currently reads "This is the current
release announcement." to clearly indicate archived content (e.g., "This is an
archived release announcement. Copy this + feature-highlights.md to LinkedIn/X.
After posting, update the social links below."). Locate and modify the exact
header comment string to avoid "current release" confusion while preserving the
rest of the comment text.

In `@massgen/backend/codex.py`:
- Around line 2569-2571: The method has_native_execution_sandbox currently
returns True unconditionally even though _build_exec_command disables Codex
sandboxing for approval_mode values like "full-access" and
"dangerous-no-sandbox"; update has_native_execution_sandbox() to check the
active approval_mode (the same config/attribute used by _build_exec_command) and
return True only when the approval_mode permits sandboxing, and False for
"full-access" and "dangerous-no-sandbox" (and any other modes that disable
sandboxing) so the advertised native sandbox status matches actual runtime
behavior.

In `@massgen/filesystem_manager/_code_execution_server.py`:
- Line 303: The stored srt settings path (mcp.srt_settings_path) should be
resolved to an absolute path before assignment to avoid breakage when
execute_command changes cwd; update where args.srt_settings is assigned (lines
setting mcp.srt_settings_path and the other occurrences around the blocks for
args.srt_settings) to call a path-resolution routine (e.g.,
Path(args.srt_settings).expanduser().resolve() or os.path.abspath) and store
that resolved string/value, ensuring execute_command and any later runtime uses
always receive an absolute path.

In `@massgen/filesystem_manager/_filesystem_manager.py`:
- Around line 1625-1629: The SRT policy is being serialized too early in
_wrap_stdio_config_with_srt (called when self.command_line_execution_mode ==
"srt" and self.srt_manager), before setup_orchestration_paths() and other steps
add agent-specific managed paths, causing a stale or generic
srt-settings-<profile>.json; move or delay the call that invokes
write_settings_file() (the code path inside _wrap_stdio_config_with_srt or
whatever writes the SRT settings) so it runs after setup_orchestration_paths()
completes, or alternatively add logic in update_backend_mcp_config() to
regenerate/write the SRT settings file when agent_id/managed paths are
finalized; update all similar call sites (the occurrences around the mentioned
blocks) to defer writing or trigger regeneration to ensure the final policy
includes orchestration-added paths.

In `@massgen/filesystem_manager/_srt_manager.py`:
- Around line 236-246: build_settings() is currently always folding
self.fs_tools_extra_writable into allow_read which leaks fs-tools-only read
paths into the execution profile; change the logic so fs_tools_extra_writable is
only merged when profile == FS_TOOLS_PROFILE. Locate build_settings (and the
call site _wrap_stdio_config_with_srt / FilesystemManager) where allow_read is
constructed and: remove the unconditional "+ [str(p) for p in
self.fs_tools_extra_writable]" from the base allow_read, and instead append
those paths only inside the branch that checks if profile == FS_TOOLS_PROFILE
(the same place you already append _framework_read_roots()), so execution
profiles do not inherit fs-tools-only read allowances.

In `@massgen/subagent/manager.py`:
- Around line 1927-1936: The SRT read confinement mode key is missing from the
inherited SRT settings, so update the srt_settings list used in the loop to
include "command_line_srt_read_mode" so backend_config will copy that key from
source_backend or fallback_backend when present; adjust the existing loop that
iterates over srt_settings (and references source_backend, fallback_backend,
backend_config) to propagate this new key alongside the other SRT settings.

In `@massgen/tests/test_srt_manager.py`:
- Around line 27-295: The new/modified functions lack Google-style docstrings
required by repo policy; add concise Google-style docstrings for the
pm_with_paths fixture (describe purpose, yield/return structure and keys: "pm",
"workspace", "temp_ws", "ctx_write", "ctx_read", "protected") and for the helper
_is_read_allowed (describe parameters allow_read: list, target: str, return:
bool and what True means), and add similar docstrings to any other new
helper/test functions you introduced so each function-level symbol
(pm_with_paths, _is_read_allowed, and any other newly added functions) has a
Google-style docstring explaining purpose, Args, Returns (or Yields for
fixtures) and brief behavior summary.

In `@massgen/tests/test_subagent_manager.py`:
- Around line 3901-3928: Add a Google-style docstring to the new test function
test_srt_settings_inherited_by_subagent describing purpose, arguments (if any),
and expected behavior; place the docstring directly under the def line in
massgen/tests/test_subagent_manager.py, briefly stating that the test verifies
SRT-related command-line settings from the parent backend are inherited into the
generated child backend config produced by
SubagentManager._generate_subagent_yaml_config when using SubagentConfig.

---

Outside diff comments:
In `@massgen/tests/test_srt_filesystem_integration.py`:
- Around line 18-159: The tests and fixtures lack Google-style docstrings; add a
one-paragraph Google-style docstring to each new/changed function (fixtures
srt_fs_manager, local_fs_manager and all test_* functions in this file such as
test_command_line_config_has_srt_args,
test_command_line_config_writes_valid_settings_file,
test_local_mode_has_no_srt_args, test_fs_tools_server_is_srt_wrapped_via_sh_c,
test_fs_tools_server_not_wrapped_in_local_mode,
test_srt_is_an_allowed_mcp_executable,
test_srt_wrapped_fs_tools_config_passes_mcp_security,
test_wrap_skips_npx_launcher, test_wrap_applies_to_non_network_launcher,
test_wrap_skips_no_roots_wrapper, test_wrap_skips_absolute_path_npx) that
briefly states the function purpose, key behavior/assertions, and for fixtures
include Args (tmp_path) and Returns (FilesystemManager) where applicable; keep
them concise and follow Google style (one-line summary, optional blank line, and
short Args/Returns sections).

In `@ROADMAP.md`:
- Line 1212: Update the duplicate "Last Updated" metadata so there's a single
authoritative date: search for the "Last Updated:" string and change the
instance that reads "April 17, 2026" to match the canonical "June 10, 2026" (or
vice versa depending on the intended authoritative date), ensuring only one
"Last Updated:" entry remains to prevent metadata drift.

---

Nitpick comments:
In `@massgen/backend/base.py`:
- Around line 699-710: Update the docstring for the has_native_execution_sandbox
method to a Google-style docstring: keep the existing summary, then add a
"Returns:" section that explicitly states the method returns a bool and
clarifies the contract (True when the backend performs its own process-level
command execution sandboxing such that MassGen's SRT/OS-level sandbox must be
disabled/degraded to local; False otherwise). Ensure the docstring remains on
the has_native_execution_sandbox method and that subclasses can rely on this
definition to decide whether to return True.

In `@massgen/backend/claude_code.py`:
- Around line 354-356: The override has_native_execution_sandbox currently uses
a short triple-quoted string; replace it with a Google-style docstring that
documents the behavior and includes a Returns: section matching the base hook
contract — update the has_native_execution_sandbox(self) -> bool method
docstring to a full Google-style docstring describing that Claude Code confines
execution via an OS-level sandbox and add a Returns: bool description stating it
returns True when native sandboxing is provided.

In `@massgen/configs/tools/filesystem/sandbox/srt_sandbox.yaml`:
- Line 27: The config sets model: "gpt-5", which is not cost‑effective per
project guidelines; update the model value in the sandbox config (look for the
YAML key model in srt_sandbox.yaml) to a recommended cheaper option such as
"gpt-5-nano", "gpt-5-mini", or "gemini-2.5-flash" so the example uses a
cost-effective model variant.

In `@massgen/filesystem_manager/_code_execution_server.py`:
- Around line 217-218: The current docstring for create_server is a one-line
triple-quoted summary; replace it with a Google-style docstring for the async
factory create_server() that includes a short summary, an "Returns:" section
documenting the returned fastmcp.FastMCP instance, and a "Raises:" section for
any exceptions the factory may propagate (e.g., connection or setup errors).
Update the docstring for the create_server function to follow Google style
(summary, blank line, detailed sections like Returns and Raises) while keeping
the description concise and accurate.

In `@massgen/filesystem_manager/_filesystem_manager.py`:
- Around line 174-178: Update the Google-style docstring for
FilesystemManager.__init__ to document the new SRT-related parameters: add an
Args: section (or extend the existing one) that lists
command_line_execution_mode (str, e.g. "local"|"docker"|"srt"), and each new
argument command_line_srt_network_allowed_domains (list[str]|None),
command_line_srt_deny_read (list[str]|None), command_line_srt_allow_unix_sockets
(list[str]|None), command_line_srt_read_mode (str, default "confined"), and
command_line_srt_allow_read (list[str]|None) with their types, defaults, and
brief purpose/behavior; also mention how these are used when
command_line_execution_mode="srt". Apply the same docstring updates to the other
changed constructor overload/variant in the same file (the second __init__ block
covering the other lines) so the public API docs are consistent.

In `@massgen/filesystem_manager/_path_permission_manager.py`:
- Around line 1452-1495: Add Google-style docstring sections to the
_validate_no_path_arg_escapes method: update its docstring to include an "Args"
section describing the tool_args parameter (type: dict[str, Any], brief purpose)
and a "Returns" section describing the returned tuple (tuple[bool, str | None],
meaning of True/False and the optional error message). Keep the existing
descriptive text but append these sections in standard Google format so linters
and readers understand the parameter and return contract.
- Around line 1437-1450: The _is_path_within_any_managed_area method is missing
Google-style docstring sections; update its docstring to include Args and
Returns sections describing the path parameter (type Path) and the boolean
return value, keep the current explanatory text, and mention that path is
resolved (e.g., "path: Path — path to check, will be resolved") and that the
function returns True if any managed_path in self.managed_paths contains or
equals the resolved path (reference managed_paths, managed_path.contains,
managed_path.path, and path.resolve).

In `@massgen/filesystem_manager/_srt_manager.py`:
- Around line 111-130: Update the docstrings for the new SRT API functions and
SrtManager methods to Google-style: for each top-level function
(wrap_command_with_srt, wrap_argv_with_srt, srt_available) and every public
method in SrtManager (lines ~168-307), replace/augment the existing docstrings
with Google-style sections including a one-line summary, an Args: section
describing parameters (name, type, brief description), and a Returns: section
describing the return type and meaning; ensure types follow the repo convention
(e.g., str | Path, list[str], bool) and include any side effects or notes where
applicable so the functions’ contracts are consistent with the project
guidelines.

In `@massgen/tests/test_srt_backend_degrade.py`:
- Around line 57-64: The test is using object.__new__ to call
has_native_execution_sandbox, which indicates the method is effectively
static/constant and should be a classmethod; change the method definition of
has_native_execution_sandbox in the backend base class (and any overrides in
CodexBackend and ClaudeCodeBackend) to be decorated with `@classmethod` and accept
cls instead of self, returning the same boolean constant, then update the test
to call CodexBackend.has_native_execution_sandbox() and
ClaudeCodeBackend.has_native_execution_sandbox() directly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e6e2979f-f512-4783-b3ee-f746b915b8b6

📥 Commits

Reviewing files that changed from the base of the PR and between 3bd3647 and 4d5c2c1.

📒 Files selected for processing (26)
  • CHANGELOG.md
  • PR_DRAFT_sandboxing.md
  • README.md
  • README_PYPI.md
  • ROADMAP.md
  • ROADMAP_v0.1.97.md
  • docs/announcements/archive/v0.1.95.md
  • docs/announcements/current-release.md
  • docs/source/index.rst
  • massgen/backend/_excluded_params.py
  • massgen/backend/base.py
  • massgen/backend/claude_code.py
  • massgen/backend/codex.py
  • massgen/configs/README.md
  • massgen/configs/tools/filesystem/sandbox/srt_sandbox.yaml
  • massgen/filesystem_manager/_code_execution_server.py
  • massgen/filesystem_manager/_filesystem_manager.py
  • massgen/filesystem_manager/_path_permission_manager.py
  • massgen/filesystem_manager/_srt_manager.py
  • massgen/mcp_tools/security.py
  • massgen/subagent/manager.py
  • massgen/tests/test_path_permission_hook_adversarial.py
  • massgen/tests/test_srt_backend_degrade.py
  • massgen/tests/test_srt_filesystem_integration.py
  • massgen/tests/test_srt_manager.py
  • massgen/tests/test_subagent_manager.py

Comment thread CHANGELOG.md
Comment on lines +10 to +33
## [0.1.96] - 2026-06-10

### Theme: OS-Level Agent Sandboxing

Add a real OS-level execution sandbox for agents via Anthropic's [sandbox-runtime](https://github.com/anthropic-experimental/sandbox-runtime) (`srt`: bubblewrap on Linux, Seatbelt on macOS), and harden the existing application-layer permission hook against file-tool escapes. **Defense in depth by design**: the OS layer (`SrtManager`) and the app layer (`PathPermissionManager`) are derived from the *same* path policy and both stay active — SRT closes the shell escape hatch (e.g. `echo x > /etc/passwd`, which never goes through a file tool), the hardened hook closes file-tool escapes (`write_file`/`move`/`copy` to/from outside the workspace). Default-off, one-knob opt-in (`command_line_execution_mode: srt`); current behavior is unchanged unless a config turns it on. All items landed under TDD (tests written first, confirmed red, then green), plus live verification across multiple backends.

### Added
- **SRT sandbox mode (`command_line_execution_mode: srt`)**: a third command-execution mode alongside `local`/`docker`. `SrtManager` (`massgen/filesystem_manager/_srt_manager.py`) derives per-agent SRT settings from `PathPermissionManager.managed_paths` — `allowWrite` for writable paths, `denyWrite` for read-only/protected paths, **network deny-all by default** (allowlist is opt-in, documented as a capability grant), and a built-in secret-store read-deny baseline. Commands are wrapped as `srt --settings cfg sh -c '<cmd>'` (the `sh -c` form is required so `srt` does not consume the server's `--` separator). Both the command-line MCP and the filesystem-tools MCP servers are OS-wrapped (defense in depth); npx/npm launchers and the no-roots wrapper auto-skip wrapping (they need registry + `~/.npm` writes the sandbox blocks) and keep their app-layer protection. Example config: `massgen/configs/tools/filesystem/sandbox/srt_sandbox.yaml`.
- **Configurable SRT read confinement (`command_line_srt_read_mode`, default `confined`)**: SRT reads are allow-all by default, so `confined` denies all of `$HOME` (personal data, secrets, other projects) and re-allows only the workspace + context + temp paths while system paths stay readable so commands run; `strict` denies `/` and allows only managed paths + a system runtime baseline + extras; `open` allows-all reads minus a built-in secret denylist + extras. `command_line_srt_allow_read` widens the allow-list per config. New backend params `command_line_srt_network_allowed_domains` / `_deny_read` / `_allow_unix_sockets` / `_allow_read` / `_read_mode` added to the single-source exclusion list; `srt` added to the MCP executable allowlist. When the `fs_tools` profile OS-wraps a framework MCP server (`fastmcp run <massgen script>`), the framework's own read roots — the interpreter/site-packages (`sys.prefix`/`sys.base_prefix`), the `massgen` package source, and git's user config (`~/.gitconfig`, `~/.config/git`; git is core to the workspace snapshot model) — are re-allowed so the wrapped server can read its own code/runtime under `confined`/`strict` while user secrets stay denied. The agent's own `execution` profile is unaffected.
- **Subagent SRT inheritance**: subagents inherit the parent's `command_line_srt_*` settings (parity with Docker).

### Changed
- **Native-sandbox backends degrade `srt`→`local`**: `has_native_execution_sandbox()` (True for `codex` `--full-auto` and `claude_code`) prevents nested Seatbelt/Landlock hangs; the stored config is normalized so downstream raw reads see `local`.

### Fixed
- **Permission-hook hardening (`PathPermissionManager`)**: new `_validate_no_path_arg_escapes` — a key-agnostic scan that walks the full tool-args tree (nested dicts + lists) and denies any value resolving outside all managed areas. Closes the prior **fail-open** behavior (path under an unrecognized key, list-valued path, or `move`/`copy` `source` pointing outside the workspace) without false positives (non-path strings resolve harmlessly inside the workspace; content keys are skipped). Symlinks/`..` were already handled by `.resolve()`.

### Tests
- New deterministic suites: `test_srt_manager.py` (settings derivation, profiles, secret read-deny baseline, protected-path read+write deny, wrapping, availability guards), `test_srt_filesystem_integration.py` (command-line + fs-tools config wiring, `sh -c` wrap, npx / no-roots auto-skip, MCP-security validation), `test_srt_backend_degrade.py` (`srt`→`local` degrade for native-sandbox backends; API backends keep `srt`), `test_path_permission_hook_adversarial.py` (15 escape vectors — absolute/`..`/symlink/unrecognized-key/list/nested-dict/move-source/copy-source/read-exfil — plus false-positive guards), and `test_subagent_manager.py::TestSrtSettingsInheritance` (subagent inherits parent SRT settings).
- Live-verified (macOS 15.7, srt 1.0.0): standalone srt (allowed-write, out-of-scope write blocked, deny-all network blocked, secret read blocked); 3 API backends (OpenRouter/`chatcompletion`, OpenAI Responses, Gemini) with workspace write OK and out-of-workspace write/file-tool escape blocked; codex + srt and claude_code + srt degrade to local and complete via their native sandbox.

### Documentations, Configurations and Resources
- **New Config**: `massgen/configs/tools/filesystem/sandbox/srt_sandbox.yaml` — fully-commented SRT opt-in example with all read/network/socket knobs documented.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add required release metadata for the 0.1.96 changelog entry.

This release block is missing a ### Technical Details section and an explicit issue/PR reference for traceability.

As per coding guidelines, release changelog entries should include categories Added, Changed, Fixed, Documentations/Configurations/Resources, Technical Details and reference issue/PR numbers.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@CHANGELOG.md` around lines 10 - 33, The 0.1.96 changelog block is missing the
required release metadata: add a "### Technical Details" subsection under the
"## [0.1.96] - 2026-06-10" header that concisely summarizes
implementation/architecture notes (SrtManager integration, PathPermissionManager
_validate_no_path_arg_escapes, command_line_srt_* config semantics,
native-sandbox degrade behavior, test-suite names like
test_srt_manager.py/test_path_permission_hook_adversarial.py) and include an
explicit issue/PR reference line (e.g. "Fixes #<issue-number>, PR #<pr-number>")
using the actual issue/PR IDs for traceability; ensure the symbols SrtManager
and PathPermissionManager are mentioned so reviewers can map the changelog to
the diff.

Source: Coding guidelines

Comment on lines +4 to +6
This is the current release announcement. Copy this + feature-highlights.md to LinkedIn/X.
After posting, update the social links below.
-->

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix archive metadata wording to avoid “current release” confusion.

This file is under docs/announcements/archive/, but the header comment still says “This is the current release announcement.” Update wording to indicate it is archived content.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/announcements/archive/v0.1.95.md` around lines 4 - 6, Update the header
comment in docs/announcements/archive/v0.1.95.md that currently reads "This is
the current release announcement." to clearly indicate archived content (e.g.,
"This is an archived release announcement. Copy this + feature-highlights.md to
LinkedIn/X. After posting, update the social links below."). Locate and modify
the exact header comment string to avoid "current release" confusion while
preserving the rest of the comment text.

Comment thread massgen/backend/base.py
Comment on lines +175 to +185
if execution_mode == "srt" and self.has_native_execution_sandbox():
logger.warning(
f"[{self.get_provider_name()}] command_line_execution_mode 'srt' ignored — this backend "
"has a native execution sandbox; using its own isolation (SRT would nest sandboxes). "
"Falling back to 'local' for MassGen's command-line MCP.",
)
execution_mode = "local"
# Normalize the stored config too, so downstream RAW reads of
# command_line_execution_mode (e.g. claude_code disallowed-tools /
# system-prompt logic) see the effective 'local', not 'srt'.
kwargs["command_line_execution_mode"] = "local"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Keep the requested execution mode separate from the degraded effective mode.

Lines 175-185 rewrite kwargs["command_line_execution_mode"] to "local", so downstream backends no longer know the user explicitly asked for srt. In ClaudeCodeBackend, "local" keeps native Bash enabled, which means this fallback can silently bypass the MCP command-policy path instead of only swapping the OS sandbox implementation. Store the degraded mode separately and leave the requested mode intact.

Comment thread massgen/backend/codex.py
Comment on lines +2569 to +2571
def has_native_execution_sandbox(self) -> bool:
"""Codex confines its own execution via `--full-auto` (Landlock/Seatbelt)."""
return True

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Only advertise a native sandbox when Codex is actually sandboxed.

_build_exec_command() explicitly disables Codex sandboxing for at least approval_mode="full-access" and approval_mode="dangerous-no-sandbox", but this method now returns True unconditionally. With the new backend-level srt -> local downgrade, that can silently drop OS isolation in exactly the modes where Codex is running unsandboxed. Gate this on the active approval mode instead of hardcoding True.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@massgen/backend/codex.py` around lines 2569 - 2571, The method
has_native_execution_sandbox currently returns True unconditionally even though
_build_exec_command disables Codex sandboxing for approval_mode values like
"full-access" and "dangerous-no-sandbox"; update has_native_execution_sandbox()
to check the active approval_mode (the same config/attribute used by
_build_exec_command) and return True only when the approval_mode permits
sandboxing, and False for "full-access" and "dangerous-no-sandbox" (and any
other modes that disable sandboxing) so the advertised native sandbox status
matches actual runtime behavior.

mcp.allowed_commands = args.allowed_commands # Whitelist patterns
mcp.blocked_commands = args.blocked_commands # Blacklist patterns
mcp.execution_mode = args.execution_mode
mcp.srt_settings_path = args.srt_settings

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Resolve --srt-settings to an absolute path before storing it.

A relative settings path can validate at startup, then fail at runtime when execute_command changes cwd before invoking SRT. Normalize once and store the resolved path.

Suggested fix
-    mcp.srt_settings_path = args.srt_settings
+    mcp.srt_settings_path = str(Path(args.srt_settings).resolve()) if args.srt_settings else None
...
-        if not Path(args.srt_settings).exists():
-            raise RuntimeError(f"SRT settings file not found: {args.srt_settings}")
+        if not Path(mcp.srt_settings_path).exists():
+            raise RuntimeError(f"SRT settings file not found: {mcp.srt_settings_path}")
...
-        print(f"[SRT] Sandbox-runtime enabled (settings: {args.srt_settings})")
+        print(f"[SRT] Sandbox-runtime enabled (settings: {mcp.srt_settings_path})")

Also applies to: 356-357, 628-628

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@massgen/filesystem_manager/_code_execution_server.py` at line 303, The stored
srt settings path (mcp.srt_settings_path) should be resolved to an absolute path
before assignment to avoid breakage when execute_command changes cwd; update
where args.srt_settings is assigned (lines setting mcp.srt_settings_path and the
other occurrences around the blocks for args.srt_settings) to call a
path-resolution routine (e.g., Path(args.srt_settings).expanduser().resolve() or
os.path.abspath) and store that resolved string/value, ensuring execute_command
and any later runtime uses always receive an absolute path.

Comment on lines +1625 to +1629
# Defense in depth: OS-sandbox the filesystem (write_file/edit_file) server
# too. Self-skips npx/npm launchers (they need registry + ~/.npm writes the
# sandbox blocks) — those keep the npm server's own --allowed-paths + the hook.
if self.command_line_execution_mode == "srt" and self.srt_manager:
config = self._wrap_stdio_config_with_srt(config)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Generate SRT settings after agent-specific paths are finalized.

These call sites serialize the SRT policy during initial MCP config construction, but setup_orchestration_paths() and later setup steps add managed paths afterward, and update_backend_mcp_config() never rewrites the settings file. That leaves the server running with a stale policy; when agent_id is still unset, it also falls back to the generic srt-settings-<profile>.json path. Defer write_settings_file() until after orchestration setup or regenerate it in the later update path.

Also applies to: 1694-1700, 1800-1804

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@massgen/filesystem_manager/_filesystem_manager.py` around lines 1625 - 1629,
The SRT policy is being serialized too early in _wrap_stdio_config_with_srt
(called when self.command_line_execution_mode == "srt" and self.srt_manager),
before setup_orchestration_paths() and other steps add agent-specific managed
paths, causing a stale or generic srt-settings-<profile>.json; move or delay the
call that invokes write_settings_file() (the code path inside
_wrap_stdio_config_with_srt or whatever writes the SRT settings) so it runs
after setup_orchestration_paths() completes, or alternatively add logic in
update_backend_mcp_config() to regenerate/write the SRT settings file when
agent_id/managed paths are finalized; update all similar call sites (the
occurrences around the mentioned blocks) to defer writing or trigger
regeneration to ensure the final policy includes orchestration-added paths.

Comment on lines +236 to +246
allow_read = managed_readable + [str(p) for p in self.fs_tools_extra_writable] + list(self.allow_read)

# The fs-tools profile wraps a FRAMEWORK MCP server (fastmcp run <massgen
# script>). Under confined/strict the server's own code + interpreter + deps
# live in a denied region ($HOME), so re-allow the framework runtime roots or
# the wrapped server can't read its own script and fails to start. This is
# framework code, not user data; the agent's own command sandbox (execution
# profile) stays tight and is unaffected. ("open" re-allows nothing because
# it is allow-all-minus-denylist; the framework roots are readable anyway.)
if profile == FS_TOOLS_PROFILE:
allow_read = allow_read + _framework_read_roots()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep fs_tools_extra_writable out of the execution profile.

build_settings() always folds self.fs_tools_extra_writable into allowRead, even when profile=="execution". In FilesystemManager._wrap_stdio_config_with_srt(), that list is populated with fs-tools-only paths such as snapshot storage and the temp workspace parent, so the command-execution sandbox inherits read access that was supposed to be reserved for the wrapped filesystem server. Restrict these extra read allowances to FS_TOOLS_PROFILE only.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@massgen/filesystem_manager/_srt_manager.py` around lines 236 - 246,
build_settings() is currently always folding self.fs_tools_extra_writable into
allow_read which leaks fs-tools-only read paths into the execution profile;
change the logic so fs_tools_extra_writable is only merged when profile ==
FS_TOOLS_PROFILE. Locate build_settings (and the call site
_wrap_stdio_config_with_srt / FilesystemManager) where allow_read is constructed
and: remove the unconditional "+ [str(p) for p in self.fs_tools_extra_writable]"
from the base allow_read, and instead append those paths only inside the branch
that checks if profile == FS_TOOLS_PROFILE (the same place you already append
_framework_read_roots()), so execution profiles do not inherit fs-tools-only
read allowances.

Comment on lines +1927 to +1936
srt_settings = [
"command_line_srt_network_allowed_domains",
"command_line_srt_deny_read",
"command_line_srt_allow_unix_sockets",
]
for setting in srt_settings:
if setting in source_backend:
backend_config[setting] = source_backend[setting]
elif setting in fallback_backend:
backend_config[setting] = fallback_backend[setting]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Propagate command_line_srt_read_mode with the other inherited SRT settings.

The child currently inherits SRT domains/read-deny/socket settings but not read confinement mode. That can silently diverge sandbox policy from the parent when child backend config omits this key.

Suggested fix
                 srt_settings = [
+                    "command_line_srt_read_mode",
                     "command_line_srt_network_allowed_domains",
                     "command_line_srt_deny_read",
                     "command_line_srt_allow_unix_sockets",
                 ]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@massgen/subagent/manager.py` around lines 1927 - 1936, The SRT read
confinement mode key is missing from the inherited SRT settings, so update the
srt_settings list used in the loop to include "command_line_srt_read_mode" so
backend_config will copy that key from source_backend or fallback_backend when
present; adjust the existing loop that iterates over srt_settings (and
references source_backend, fallback_backend, backend_config) to propagate this
new key alongside the other SRT settings.

Comment on lines +27 to +295
@pytest.fixture
def pm_with_paths(tmp_path):
"""A PathPermissionManager populated like a real agent setup."""
workspace = tmp_path / "workspace"
temp_ws = tmp_path / "temp_ws"
ctx_write = tmp_path / "ctx_write"
ctx_read = tmp_path / "ctx_read"
protected = ctx_write / "secrets"
for d in (workspace, temp_ws, ctx_write, ctx_read, protected):
d.mkdir(parents=True, exist_ok=True)

pm = PathPermissionManager(context_write_access_enabled=True)
pm.add_path(workspace, Permission.WRITE, "workspace")
pm.add_path(temp_ws, Permission.READ, "temp_workspace")
pm.add_context_paths(
[
{"path": str(ctx_write), "permission": "write", "protected_paths": ["secrets"]},
{"path": str(ctx_read), "permission": "read"},
],
)
return {
"pm": pm,
"workspace": workspace.resolve(),
"temp_ws": temp_ws.resolve(),
"ctx_write": ctx_write.resolve(),
"ctx_read": ctx_read.resolve(),
"protected": protected.resolve(),
}


# --------------------------------------------------------------------------- #
# build_settings — execution profile
# --------------------------------------------------------------------------- #
def test_execution_profile_workspace_and_write_context_are_writable(pm_with_paths):
mgr = SrtManager(pm_with_paths["pm"])
settings = mgr.build_settings(profile="execution")
allow_write = settings["filesystem"]["allowWrite"]
assert str(pm_with_paths["workspace"]) in allow_write
assert str(pm_with_paths["ctx_write"]) in allow_write


def test_execution_profile_temp_and_read_context_are_not_writable(pm_with_paths):
mgr = SrtManager(pm_with_paths["pm"])
settings = mgr.build_settings(profile="execution")
allow_write = settings["filesystem"]["allowWrite"]
deny_write = settings["filesystem"]["denyWrite"]
# Temp workspace is read-only for the agent during coordination.
assert str(pm_with_paths["temp_ws"]) not in allow_write
assert str(pm_with_paths["temp_ws"]) in deny_write
assert str(pm_with_paths["ctx_read"]) in deny_write


def test_protected_paths_are_also_deny_write(pm_with_paths):
# Protected paths are immune from modification even inside a writable context.
mgr = SrtManager(pm_with_paths["pm"])
settings = mgr.build_settings(profile="execution")
assert str(pm_with_paths["protected"]) in settings["filesystem"]["denyWrite"]


# --------------------------------------------------------------------------- #
# Read-confinement modes (SRT reads are allow-all by default; allowRead WINS)
# --------------------------------------------------------------------------- #
def test_default_read_mode_is_confined(pm_with_paths):
assert SrtManager(pm_with_paths["pm"]).read_mode == "confined"


def test_confined_mode_denies_home_allows_managed(pm_with_paths):
from pathlib import Path

mgr = SrtManager(pm_with_paths["pm"]) # default confined
fs = mgr.build_settings(profile="execution")["filesystem"]
# $HOME denied (covers ~/.ssh, ~/.aws, other projects, personal data)…
assert str(Path.home()) in fs["denyRead"]
assert "/etc/shadow" in fs["denyRead"]
# …but the agent's managed paths are re-allowed (allowRead wins over denyRead).
assert str(pm_with_paths["workspace"]) in fs["allowRead"]
assert str(pm_with_paths["ctx_read"]) in fs["allowRead"]


def test_strict_mode_denies_root_allows_managed_and_system(pm_with_paths):
mgr = SrtManager(pm_with_paths["pm"], read_mode="strict")
fs = mgr.build_settings(profile="execution")["filesystem"]
assert fs["denyRead"] == ["/"]
assert str(pm_with_paths["workspace"]) in fs["allowRead"]
assert "/usr" in fs["allowRead"] # system baseline so commands can run


def test_open_mode_uses_secret_denylist(pm_with_paths):
from pathlib import Path

mgr = SrtManager(pm_with_paths["pm"], read_mode="open", extra_deny_read=["/some/extra/secret"])
fs = mgr.build_settings(profile="execution")["filesystem"]
home = Path.home()
for rel in (".ssh", ".aws", ".gnupg"):
assert str(home / rel) in fs["denyRead"]
# protected + extras are read-denied in open mode (nothing re-allows them).
assert str(pm_with_paths["protected"]) in fs["denyRead"]
assert "/some/extra/secret" in fs["denyRead"]
assert fs["allowRead"] == []


def test_allow_read_extras_propagate(pm_with_paths):
mgr = SrtManager(pm_with_paths["pm"], allow_read=["/opt/shared-cache"])
fs = mgr.build_settings(profile="execution")["filesystem"]
assert "/opt/shared-cache" in fs["allowRead"]


def test_invalid_read_mode_falls_back_to_confined(pm_with_paths):
assert SrtManager(pm_with_paths["pm"], read_mode="bogus").read_mode == "confined"


# --------------------------------------------------------------------------- #
# build_settings — fs_tools profile (defense in depth, must allow snapshots)
# --------------------------------------------------------------------------- #
def test_fs_tools_profile_widens_writes_for_temp_and_snapshot(pm_with_paths, tmp_path):
snapshot = tmp_path / "snapshot_storage"
snapshot.mkdir()
mgr = SrtManager(pm_with_paths["pm"], fs_tools_extra_writable=[snapshot])
settings = mgr.build_settings(profile="fs_tools")
allow_write = settings["filesystem"]["allowWrite"]
# The fs-tools SERVER must be able to write workspace + temp + snapshot_storage,
# even though the AGENT sees temp as read-only.
assert str(pm_with_paths["workspace"]) in allow_write
assert str(pm_with_paths["temp_ws"]) in allow_write
assert str(snapshot.resolve()) in allow_write


def _is_read_allowed(allow_read, target: str) -> bool:
"""True if `target` is covered by some allowRead root (itself or an ancestor)."""
from pathlib import Path as _P

t = _P(target).resolve()
for root in allow_read:
r = _P(root).resolve()
if t == r or r in t.parents:
return True
return False


def test_fs_tools_profile_confined_allows_reading_framework_runtime(pm_with_paths):
"""REGRESSION: when SRT wraps a framework MCP server (fastmcp run <massgen script>),
confined mode denies all of $HOME — but the venv (fastmcp + deps + interpreter) and
the massgen package source both live under $HOME. Without re-allowing the framework's
own read roots, `srt` denies reading the server's own code and the server can't start
("Operation not permitted: _workspace_tools_server.py"). The fs_tools profile must
re-allow the framework runtime roots so the wrapped server can read its own code while
$HOME otherwise stays denied.
"""
import sys
from pathlib import Path

import massgen

mgr = SrtManager(pm_with_paths["pm"]) # default confined
fs = mgr.build_settings(profile="fs_tools")["filesystem"]

# $HOME is still denied (we didn't just open everything back up).
assert str(Path.home()) in fs["denyRead"]

# The framework's own code + interpreter + deps must be readable (allowRead wins).
massgen_dir = Path(massgen.__file__).resolve().parent
assert _is_read_allowed(fs["allowRead"], str(massgen_dir)), "massgen package dir must be readable by the wrapped fs-tools server"
assert _is_read_allowed(fs["allowRead"], sys.prefix), "Python prefix (venv: fastmcp + deps) must be readable"
assert _is_read_allowed(fs["allowRead"], sys.base_prefix), "base Python prefix must be readable"

# git is core to the workspace model (GitPython reads ~/.gitconfig at import), so
# its user config must be readable too — else the server crashes on import under
# confined ("unable to access '~/.gitconfig': Operation not permitted").
assert _is_read_allowed(fs["allowRead"], str(Path.home() / ".gitconfig")), "git user config must be readable by the wrapped fs-tools server"


def test_execution_profile_does_not_widen_for_framework_runtime(pm_with_paths):
"""The framework-runtime re-allow is fs_tools-only: the agent's own command sandbox
(execution profile) stays tight and must NOT gain the massgen package dir just because
fs_tools needs it."""
from pathlib import Path

import massgen

mgr = SrtManager(pm_with_paths["pm"]) # default confined
fs = mgr.build_settings(profile="execution")["filesystem"]
massgen_dir = str(Path(massgen.__file__).resolve().parent)
assert massgen_dir not in fs["allowRead"]


# --------------------------------------------------------------------------- #
# network — deny-all by default, opt-in allowlist
# --------------------------------------------------------------------------- #
def test_network_default_deny_all(pm_with_paths):
mgr = SrtManager(pm_with_paths["pm"])
settings = mgr.build_settings(profile="execution")
assert settings["network"]["allowedDomains"] == []


def test_network_allowlist_is_opt_in(pm_with_paths):
mgr = SrtManager(pm_with_paths["pm"], network_allowed_domains=["api.anthropic.com"])
settings = mgr.build_settings(profile="execution")
assert settings["network"]["allowedDomains"] == ["api.anthropic.com"]


def test_allow_unix_sockets_passthrough(pm_with_paths):
mgr = SrtManager(pm_with_paths["pm"], allow_unix_sockets=["/var/run/docker.sock"])
settings = mgr.build_settings(profile="execution")
assert settings["network"]["allowUnixSockets"] == ["/var/run/docker.sock"]


# --------------------------------------------------------------------------- #
# write_settings_file
# --------------------------------------------------------------------------- #
def test_write_settings_file_produces_valid_json(pm_with_paths, tmp_path):
mgr = SrtManager(pm_with_paths["pm"], settings_dir=tmp_path / "srt")
path = mgr.write_settings_file(profile="execution", agent_id="agent_a")
assert path.exists()
data = json.loads(path.read_text())
assert "filesystem" in data and "network" in data
assert str(pm_with_paths["workspace"]) in data["filesystem"]["allowWrite"]


# --------------------------------------------------------------------------- #
# wrapping — single source of truth shared with the MCP server
# --------------------------------------------------------------------------- #
def test_wrap_command_string_form():
assert wrap_command_with_srt("echo hi", "/tmp/cfg.json") == "srt --settings /tmp/cfg.json sh -c 'echo hi'"


def test_wrap_command_quotes_shell_metacharacters():
wrapped = wrap_command_with_srt("echo hi | grep h", "/tmp/cfg.json")
# The original command must be passed as a single quoted argument to `sh -c`,
# so the pipe runs INSIDE the sandbox, not in the outer (unsandboxed) shell.
assert wrapped == "srt --settings /tmp/cfg.json sh -c 'echo hi | grep h'"


def test_wrap_argv_list_form():
assert wrap_argv_with_srt(["codex", "exec", "--json"], "/tmp/cfg.json") == [
"srt",
"--settings",
"/tmp/cfg.json",
"codex",
"exec",
"--json",
]


def test_custom_srt_binary_path():
assert wrap_argv_with_srt(["x"], "/c.json", srt_path="/opt/srt")[0] == "/opt/srt"


# --------------------------------------------------------------------------- #
# availability / platform guards
# --------------------------------------------------------------------------- #
def test_srt_available_false_when_missing(monkeypatch):
monkeypatch.setattr("massgen.filesystem_manager._srt_manager.shutil.which", lambda _: None)
assert srt_available() is False


def test_verify_available_raises_actionable_error_when_missing(monkeypatch, pm_with_paths):
monkeypatch.setattr("massgen.filesystem_manager._srt_manager.platform.system", lambda: "Darwin")
monkeypatch.setattr("massgen.filesystem_manager._srt_manager.shutil.which", lambda _: None)
mgr = SrtManager(pm_with_paths["pm"])
with pytest.raises(RuntimeError, match="sandbox-runtime"):
mgr.verify_available()


def test_verify_available_raises_on_windows(monkeypatch, pm_with_paths):
monkeypatch.setattr("massgen.filesystem_manager._srt_manager.platform.system", lambda: "Windows")
monkeypatch.setattr("massgen.filesystem_manager._srt_manager.shutil.which", lambda _: "C:/srt.exe")
mgr = SrtManager(pm_with_paths["pm"])
with pytest.raises(RuntimeError, match="(?i)windows"):
mgr.verify_available()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Add Google-style docstrings to newly added functions in this module.

Multiple new/changed functions here (fixture/helper/tests) are missing Google-style docstrings required by repo standards.

As per coding guidelines, **/*.py: “For new or changed functions, include Google-style docstrings”.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@massgen/tests/test_srt_manager.py` around lines 27 - 295, The new/modified
functions lack Google-style docstrings required by repo policy; add concise
Google-style docstrings for the pm_with_paths fixture (describe purpose,
yield/return structure and keys: "pm", "workspace", "temp_ws", "ctx_write",
"ctx_read", "protected") and for the helper _is_read_allowed (describe
parameters allow_read: list, target: str, return: bool and what True means), and
add similar docstrings to any other new helper/test functions you introduced so
each function-level symbol (pm_with_paths, _is_read_allowed, and any other newly
added functions) has a Google-style docstring explaining purpose, Args, Returns
(or Yields for fixtures) and brief behavior summary.

Source: Coding guidelines

Comment on lines +3901 to +3928
def test_srt_settings_inherited_by_subagent(self, tmp_path):
from pathlib import Path

from massgen.subagent.manager import SubagentManager
from massgen.subagent.models import SubagentConfig

parent_backend = {
"type": "openai",
"model": "gpt-5",
"cwd": "ws",
"enable_mcp_command_line": True,
"command_line_execution_mode": "srt",
"command_line_srt_network_allowed_domains": ["pypi.org"],
"command_line_srt_deny_read": ["/x/.secret"],
}
parent_cfg = {"id": "parent", "backend": parent_backend}
mgr = SubagentManager(
parent_workspace=str(tmp_path),
parent_agent_id="parent",
orchestrator_id="orch",
parent_agent_configs=[parent_cfg],
)
cfg = SubagentConfig.create(task="do x", parent_agent_id="parent")
out = mgr._generate_subagent_yaml_config(cfg, Path(tmp_path) / "sub")
child_backend = out["agents"][0]["backend"]
assert child_backend["command_line_execution_mode"] == "srt"
assert child_backend["command_line_srt_network_allowed_domains"] == ["pypi.org"]
assert child_backend["command_line_srt_deny_read"] == ["/x/.secret"]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Add a Google-style docstring to the new test method.

test_srt_settings_inherited_by_subagent is newly added and should include a Google-style docstring per repo rules.

As per coding guidelines, **/*.py: “For new or changed functions, include Google-style docstrings”.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@massgen/tests/test_subagent_manager.py` around lines 3901 - 3928, Add a
Google-style docstring to the new test function
test_srt_settings_inherited_by_subagent describing purpose, arguments (if any),
and expected behavior; place the docstring directly under the def line in
massgen/tests/test_subagent_manager.py, briefly stating that the test verifies
SRT-related command-line settings from the parent backend are inherited into the
generated child backend config produced by
SubagentManager._generate_subagent_yaml_config when using SubagentConfig.

Source: Coding guidelines

@Henry-811 Henry-811 merged commit 1508ebb into dev/v0.1.96 Jun 10, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants