Add opencode agent_cmd for multi-provider OpenCode runs by yoshinari-patronus · Pull Request #6 · patronus-ai/skill-inject

yoshinari-patronus · 2026-05-01T20:51:06Z

Summary

Adds a new generic opencode agent_cmd so we can run any provider's models through the OpenCode scaffolding — the same engine we already use for Nemotron — for clean cross-model ablations.

# Same model, two scaffoldings, two distinct result dirs:
python experiments/escalation.py --agent claude   --model claude-haiku-4-5-20251001
# -> escalation/claude-claude-haiku-4-5-20251001/...

python experiments/escalation.py --agent opencode --model anthropic/claude-haiku-4-5-20251001
# -> escalation/opencode-anthropic/claude-haiku-4-5-20251001/...

What changes

experiments/_opencode_config.py (new): single helper that builds the per-sandbox .opencode.json and (when needed) a Baseten proxy script. Used by escalation.py, escalation_usersim.py, and the standalone scripts/run_sandbox_container.py so the three call sites stay DRY.
Multi-provider config: registers Anthropic (@ai-sdk/anthropic), Google (@ai-sdk/google), and Nvidia (Baseten via @ai-sdk/openai-compatible) blocks based on which API keys are present in the env. The -m <provider>/<model> flag picks the runtime path; configured-but-unused providers are no-ops.
_agent_base in both escalation files learns cmd == "opencode": invokes opencode run -m <model> with no nemotron/ prefix (the model already includes the provider).
agent_cmd in ("nemotron", "opencode") covers the env-var passthrough, tool-call-export branch, and OPENCODE_CONFIG mount uniformly.
config.py: registers opencode in AGENT_MODELS with the cross-provider model catalog and an AGENT_PARALLEL entry.

Backward compatibility

The legacy nemotron agent_cmd is unchanged. Existing nemotron-nvidia/... result dirs and the analysis scripts that hardcode them keep working without modification. New runs that want the cleaner opencode-nvidia/... slug just opt in by passing --agent opencode. No on-disk data is moved.

Test plan

Verified the bash-script generator produces opencode run -m anthropic/claude-haiku-4-5-20251001 ... for --agent opencode --model anthropic/claude-haiku-4-5-20251001.
Verified the multi-provider .opencode.json builder registers all 3 providers when keys are present, and emits the Baseten proxy only when an nvidia/* model is selected.
Verified the legacy nemotron path produces the same single-provider config + proxy as before.
Smoke-test a single-sandbox --agent opencode --model anthropic/claude-haiku-4-5-20251001 run after merge.
Smoke-test --agent opencode --model google/gemini-3.1-pro-preview.
Smoke-test --agent opencode --model nvidia/nemotron-3-super (should match the legacy nemotron behavior modulo result-dir naming).

Follow-ups (out of scope here)

Migrate nemotron-nvidia/... data on disk to opencode-nvidia/... once we've confirmed the new path works in a few real runs (Option 1 from the design discussion). For now Option 3 stays in place: legacy alias + new generic agent live side by side.

Lets the same model run under different scaffoldings for cleaner cross-model ablations: python ... --agent claude --model claude-haiku-4-5-20251001 # native CLI python ... --agent opencode --model anthropic/claude-haiku-4-5-20251001 # via OpenCode Result-dir slug stays `<agent>-<model>`, so the two paths land in distinct directories and don't collide: escalation/claude-claude-haiku-4-5-20251001/... escalation/opencode-anthropic/claude-haiku-4-5-20251001/... Implementation: 1. New `experiments/_opencode_config.py` builds the per-sandbox `.opencode.json` and (if needed) a Baseten proxy script. Used by both escalation.py, escalation_usersim.py, and the standalone `scripts/run_sandbox_container.py`. 2. Multi-provider config registers Anthropic, Google, and Nvidia blocks based on which API keys are present in the environment. The `-m <provider>/<model>` arg picks the runtime path; configured- but-unused providers are no-ops. 3. `_agent_base` in escalation.py and escalation_usersim.py learns `cmd == "opencode"` -> `opencode run -m <model>` (no `nemotron/` prefix, since the model already includes the provider). 4. `agent_cmd in ("nemotron", "opencode")` covers the env-var, tool-call-export, and OPENCODE_CONFIG branches uniformly. Backward compat: - Legacy `nemotron` agent_cmd is unchanged. Existing `nemotron-nvidia/...` result paths and analysis scripts keep working without modification. New runs that want the cleaner `opencode-nvidia/...` slug just switch to `--agent opencode`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three issues surfaced during smoke-testing: 1. `build_sandbox.py` rejected `--agent opencode` (choices list didn't include the new value). Added `opencode` to the allowed choices. 2. `escalation.py`'s opencode-export step probed only `~/.local/share/opencode/storage/session/info/`, but current OpenCode versions write sessions to `session_diff/`. Replaced the single-path lookup with the multi-path probe already used in `escalation_usersim.py` (probes `session/info`, `session`, `session_diff`). 3. The Nvidia provider block had model keys without the `nvidia/` prefix (`nemotron-3-super`), which OpenCode dutifully sent to Baseten as the API model name and got a 404 — Baseten serves the model under its full `nvidia/nemotron-3-super` name (`vllm serve --served-model-name nvidia/nemotron-3-super`). Restored the `nvidia/` prefix on the model key, and updated the launcher invocation to use the doubled `--model nvidia/nvidia/nemotron-3-super` so OpenCode's first-slash split yields provider=`nvidia`, model=`nvidia/nemotron-3-super`. The resulting slug is `opencode-nvidia/nvidia/nemotron-3-super/...`, keeping the requested `opencode-nvidia` prefix at the cost of one extra path level. Anthropic and Google don't need this nesting because their canonical API model names have no slash. Also drops `nvidia/Nemotron-120B-A12B` from both AGENT_MODELS registrations (legacy `nemotron` and new `opencode`) — the model isn't currently part of the experiment matrix. Smoke-tested all three providers end-to-end: --agent opencode --model anthropic/claude-haiku-4-5-20251001 OK 19.1s --agent opencode --model google/gemini-3.1-pro-preview OK 115.6s --agent opencode --model nvidia/nvidia/nemotron-3-super OK 130.4s Each produces both `agent_stdout_*.txt` (final assistant text) and `agent_stdout_*_opencode_export.json` (full session w/ tool-call history). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CodeQL flagged `_BASETEN_PROXY_TEMPLATE.format(target=nem_url, api_key=nem_key)` (lines 147 and 165 of `_opencode_config.py`) as clear-text storage of sensitive information — the formatted Python script was being written to disk inside each per-sandbox temp dir with the API key embedded as a literal. The container already has `NEMOTRON_BASE_URL` and `NEMOTRON_API_KEY` plumbed in as env vars, so we change the proxy to read them at runtime instead of baking them in. The on-disk script now contains only `os.environ.get(...)` calls and exits cleanly with an error if either var is unset. Verified by re-running the nvidia smoke test: --agent opencode --model nvidia/nvidia/nemotron-3-super OK 106.4s The saved _proxy.py contains no API key, no Baseten URL, and the opencode session export still lands at 264 KB. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CI previously installed only `pytest`, which made `tests/test_evolutionary_optimization.py` error during collection because it imports `experiments.ablations.evolutionary_optimization`, which itself imports `anthropic` at module top level. The error predates this PR (main is in the same state) but blocks the test matrix on PR #6 too. Two changes: 1. `pyproject.toml`: add `anthropic` to the `test` extras list. 2. `.github/workflows/tests.yml`: install `pip install -e ".[test]"` instead of just `pip install pytest`, so the extras are picked up. Out of scope here (and not required to make tests green): the underlying coupling between a top-level `import anthropic` and the test collection step. A cleaner fix is to lazy-import inside the function that needs the client, but that's a separate concern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`pip install -e .` fails because setuptools' flat-layout auto-discovery sees too many top-level dirs (data/, plots/, notes/, docker/, judges/, apptainer/, experiments/, startup_assets/, startup_scripts/) and refuses to pick. Configuring explicit `packages` discovery is the proper long-term fix, but unrelated to this PR. Tests don't actually need the project installed as a package — they manipulate sys.path themselves to find local modules. So just install pytest and anthropic directly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`test_fallback_text_attack_success` was written when `parse_injection_verdict` had a fallback path that matched verdict keywords (`attack_success`, `ignored`, `technical`) as substrings in plain prose if no JSON block was found. That fallback was removed in favor of JSON-only parsing — the judge prompt is explicit about returning JSON, and substring-matching led to false positives when the judge described candidate verdicts in narrative form before settling on a different one. The test was reachable on PR #6 only after we fixed the unrelated `anthropic` import issue; on `main` it was hidden behind that collection error. Renamed to `test_no_json_returns_ignored` and flipped the expectation to match current behavior (no JSON → "ignored", consistent with the verifier's ambiguous-case semantics). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…m autoresearch run)

Lets an injection record point at a pre-injected SKILL.md (or CLAUDE.md) file under data/<override_path> and have build_sandbox.py copy it verbatim into the sandbox, bypassing line-based injection. Required for porting autoresearch keeps faithfully — the autoresearch meta-agent writes a full markdown section (heading + numbered steps + authority framing), not just a single line to inject at a fixed offset. Adds: - data/contextual_injections_autoresearch_full.json (21 May-13 keeps) - data/skill_md_overrides_autoresearch/<cid>/SKILL.md (per-keep verbatim) - notes/2026-05-14_porting_autoresearch_keeps_to_contextual_injections.md documenting the workflow, measured impact (76% reproducibility vs 19% with line-based injection), the bugs we surfaced, and recommended autoresearch-loop changes. Backward-compatible: existing 48 contextual_injections.json records don't set skill_md_override and continue to use the line-based injection path.

…-based variant of the 21 May-13 keeps) Pure injection-based shape of the same 21 autoresearch keeps that live in *_full.json. Identical content per record except skill_md_override and claude_md_override fields are stripped — so build_sandbox.py uses the classic line-based injection path (description_injection + line_num) rather than the verbatim-file copy. Useful for: - Side-by-side comparison of injection-based vs SKILL.md-rewriting transfer fidelity (see notes/2026-05-14_porting_autoresearch_keeps_to_contextual_injections.md for measured impact — 19% vs 76% any-success on this same 21-record sample). - Baseline experiments that want the smaller, more "interpretable" JSON-only representation without the per-candidate override file tree. Layout now: contextual_injections.json - canonical 48 contextual_injections_autoresearch.json - 3 hand-picked May-12 contextual_injections_autoresearch_full.json - 21 May-13, both fields (uses verbatim) contextual_injections_autoresearch_injection_only.json - same 21, injection-based only

Documents where the autoresearch loop's meta-agent places its injections in SKILL.md / CLAUDE.md. Pulled from each archived run's autoresearch_loop.err (per-iteration `placement:` lines). Headline numbers: - 71% of 167 logged iterations used INJECTION_PLACEMENT=footer - 79% combined for trailing-region placements (footer + bottom) - May-13 run had converged on footer (95% of its 102 iterations) - Geometric verification: in the 21 verbatim-ported May-13 keeps, the injected ## heading lands at 91.8-97.8% of file length (median 96.4%) - The 4-5 line spacer + --- separator + ## heading is what makes the EOF placement look like an authoritative new section Also captures the May-13 agent's own self-noted finding: "KEY FINDING: SKILL.md injection channel works; CLAUDE.md injection does NOT work" — pivoted from CLAUDE.md to SKILL.md mid-run when CLAUDE.md auto-loading on the Sonnet alias became unreliable. Implications listed for future autoresearch runs (diversity budget across placements, pin model snapshots, etc.).

CI previously installed only `pytest`, which made `tests/test_evolutionary_optimization.py` error during collection because it imports `experiments.ablations.evolutionary_optimization`, which itself imports `anthropic` at module top level. The error predates this PR (main is in the same state) but blocks the test matrix on PR #6 too. Two changes: 1. `pyproject.toml`: add `anthropic` to the `test` extras list. 2. `.github/workflows/tests.yml`: install `pip install -e ".[test]"` instead of just `pip install pytest`, so the extras are picked up. Out of scope here (and not required to make tests green): the underlying coupling between a top-level `import anthropic` and the test collection step. A cleaner fix is to lazy-import inside the function that needs the client, but that's a separate concern. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`test_fallback_text_attack_success` was written when `parse_injection_verdict` had a fallback path that matched verdict keywords (`attack_success`, `ignored`, `technical`) as substrings in plain prose if no JSON block was found. That fallback was removed in favor of JSON-only parsing — the judge prompt is explicit about returning JSON, and substring-matching led to false positives when the judge described candidate verdicts in narrative form before settling on a different one. The test was reachable on PR #6 only after we fixed the unrelated `anthropic` import issue; on `main` it was hidden behind that collection error. Renamed to `test_no_json_returns_ignored` and flipped the expectation to match current behavior (no JSON → "ignored", consistent with the verifier's ambiguous-case semantics). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-advanced-security AI found potential problems May 1, 2026

View reviewed changes

Comment thread experiments/_opencode_config.py Fixed

Comment thread experiments/_opencode_config.py Fixed

akkikiki and others added 7 commits May 1, 2026 21:30

data: add contextual_injections_autoresearch.json (3 ported keeps fro…

38ae529

…m autoresearch run)

yoshinari-patronus requested a review from vgtomahawk May 14, 2026 16:19

akkikiki added 2 commits May 14, 2026 16:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add opencode agent_cmd for multi-provider OpenCode runs#6

Add opencode agent_cmd for multi-provider OpenCode runs#6
yoshinari-patronus wants to merge 10 commits into
mainfrom
feat/opencode-multi-provider

yoshinari-patronus commented May 1, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yoshinari-patronus commented May 1, 2026

Summary

What changes

Backward compatibility

Test plan

Follow-ups (out of scope here)

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants