Skip to content

Add opencode agent_cmd for multi-provider OpenCode runs#6

Open
yoshinari-patronus wants to merge 10 commits into
mainfrom
feat/opencode-multi-provider
Open

Add opencode agent_cmd for multi-provider OpenCode runs#6
yoshinari-patronus wants to merge 10 commits into
mainfrom
feat/opencode-multi-provider

Conversation

@yoshinari-patronus
Copy link
Copy Markdown
Collaborator

Summary

Adds a new generic opencode agent_cmd so we can run any provider's models through the OpenCode scaffolding — the same engine we already use for Nemotron — for clean cross-model ablations.

# Same model, two scaffoldings, two distinct result dirs:
python experiments/escalation.py --agent claude   --model claude-haiku-4-5-20251001
# -> escalation/claude-claude-haiku-4-5-20251001/...

python experiments/escalation.py --agent opencode --model anthropic/claude-haiku-4-5-20251001
# -> escalation/opencode-anthropic/claude-haiku-4-5-20251001/...

What changes

  1. experiments/_opencode_config.py (new): single helper that builds the per-sandbox .opencode.json and (when needed) a Baseten proxy script. Used by escalation.py, escalation_usersim.py, and the standalone scripts/run_sandbox_container.py so the three call sites stay DRY.
  2. Multi-provider config: registers Anthropic (@ai-sdk/anthropic), Google (@ai-sdk/google), and Nvidia (Baseten via @ai-sdk/openai-compatible) blocks based on which API keys are present in the env. The -m <provider>/<model> flag picks the runtime path; configured-but-unused providers are no-ops.
  3. _agent_base in both escalation files learns cmd == "opencode": invokes opencode run -m <model> with no nemotron/ prefix (the model already includes the provider).
  4. agent_cmd in ("nemotron", "opencode") covers the env-var passthrough, tool-call-export branch, and OPENCODE_CONFIG mount uniformly.
  5. config.py: registers opencode in AGENT_MODELS with the cross-provider model catalog and an AGENT_PARALLEL entry.

Backward compatibility

The legacy nemotron agent_cmd is unchanged. Existing nemotron-nvidia/... result dirs and the analysis scripts that hardcode them keep working without modification. New runs that want the cleaner opencode-nvidia/... slug just opt in by passing --agent opencode. No on-disk data is moved.

Test plan

  • Verified the bash-script generator produces opencode run -m anthropic/claude-haiku-4-5-20251001 ... for --agent opencode --model anthropic/claude-haiku-4-5-20251001.
  • Verified the multi-provider .opencode.json builder registers all 3 providers when keys are present, and emits the Baseten proxy only when an nvidia/* model is selected.
  • Verified the legacy nemotron path produces the same single-provider config + proxy as before.
  • Smoke-test a single-sandbox --agent opencode --model anthropic/claude-haiku-4-5-20251001 run after merge.
  • Smoke-test --agent opencode --model google/gemini-3.1-pro-preview.
  • Smoke-test --agent opencode --model nvidia/nemotron-3-super (should match the legacy nemotron behavior modulo result-dir naming).

Follow-ups (out of scope here)

  • Migrate nemotron-nvidia/... data on disk to opencode-nvidia/... once we've confirmed the new path works in a few real runs (Option 1 from the design discussion). For now Option 3 stays in place: legacy alias + new generic agent live side by side.

Lets the same model run under different scaffoldings for cleaner
cross-model ablations:

  python ... --agent claude   --model claude-haiku-4-5-20251001  # native CLI
  python ... --agent opencode --model anthropic/claude-haiku-4-5-20251001  # via OpenCode

Result-dir slug stays `<agent>-<model>`, so the two paths land in
distinct directories and don't collide:
  escalation/claude-claude-haiku-4-5-20251001/...
  escalation/opencode-anthropic/claude-haiku-4-5-20251001/...

Implementation:
1. New `experiments/_opencode_config.py` builds the per-sandbox
   `.opencode.json` and (if needed) a Baseten proxy script. Used by
   both escalation.py, escalation_usersim.py, and the standalone
   `scripts/run_sandbox_container.py`.
2. Multi-provider config registers Anthropic, Google, and Nvidia
   blocks based on which API keys are present in the environment.
   The `-m <provider>/<model>` arg picks the runtime path; configured-
   but-unused providers are no-ops.
3. `_agent_base` in escalation.py and escalation_usersim.py learns
   `cmd == "opencode"` -> `opencode run -m <model>` (no `nemotron/`
   prefix, since the model already includes the provider).
4. `agent_cmd in ("nemotron", "opencode")` covers the env-var,
   tool-call-export, and OPENCODE_CONFIG branches uniformly.

Backward compat:
- Legacy `nemotron` agent_cmd is unchanged. Existing
  `nemotron-nvidia/...` result paths and analysis scripts keep
  working without modification. New runs that want the cleaner
  `opencode-nvidia/...` slug just switch to `--agent opencode`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread experiments/_opencode_config.py Fixed
Comment thread experiments/_opencode_config.py Fixed
akkikiki and others added 7 commits May 1, 2026 21:30
Three issues surfaced during smoke-testing:

1. `build_sandbox.py` rejected `--agent opencode` (choices list didn't
   include the new value). Added `opencode` to the allowed choices.

2. `escalation.py`'s opencode-export step probed only
   `~/.local/share/opencode/storage/session/info/`, but current OpenCode
   versions write sessions to `session_diff/`. Replaced the single-path
   lookup with the multi-path probe already used in `escalation_usersim.py`
   (probes `session/info`, `session`, `session_diff`).

3. The Nvidia provider block had model keys without the `nvidia/` prefix
   (`nemotron-3-super`), which OpenCode dutifully sent to Baseten as the
   API model name and got a 404 — Baseten serves the model under its
   full `nvidia/nemotron-3-super` name (`vllm serve --served-model-name
   nvidia/nemotron-3-super`). Restored the `nvidia/` prefix on the model
   key, and updated the launcher invocation to use the doubled
   `--model nvidia/nvidia/nemotron-3-super` so OpenCode's first-slash
   split yields provider=`nvidia`, model=`nvidia/nemotron-3-super`. The
   resulting slug is `opencode-nvidia/nvidia/nemotron-3-super/...`,
   keeping the requested `opencode-nvidia` prefix at the cost of one
   extra path level. Anthropic and Google don't need this nesting
   because their canonical API model names have no slash.

Also drops `nvidia/Nemotron-120B-A12B` from both AGENT_MODELS
registrations (legacy `nemotron` and new `opencode`) — the model
isn't currently part of the experiment matrix.

Smoke-tested all three providers end-to-end:
  --agent opencode --model anthropic/claude-haiku-4-5-20251001    OK 19.1s
  --agent opencode --model google/gemini-3.1-pro-preview          OK 115.6s
  --agent opencode --model nvidia/nvidia/nemotron-3-super         OK 130.4s
Each produces both `agent_stdout_*.txt` (final assistant text) and
`agent_stdout_*_opencode_export.json` (full session w/ tool-call history).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CodeQL flagged `_BASETEN_PROXY_TEMPLATE.format(target=nem_url,
api_key=nem_key)` (lines 147 and 165 of `_opencode_config.py`) as
clear-text storage of sensitive information — the formatted Python
script was being written to disk inside each per-sandbox temp dir
with the API key embedded as a literal.

The container already has `NEMOTRON_BASE_URL` and `NEMOTRON_API_KEY`
plumbed in as env vars, so we change the proxy to read them at
runtime instead of baking them in. The on-disk script now contains
only `os.environ.get(...)` calls and exits cleanly with an error if
either var is unset.

Verified by re-running the nvidia smoke test:
  --agent opencode --model nvidia/nvidia/nemotron-3-super  OK 106.4s
The saved _proxy.py contains no API key, no Baseten URL, and the
opencode session export still lands at 264 KB.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI previously installed only `pytest`, which made
`tests/test_evolutionary_optimization.py` error during collection
because it imports `experiments.ablations.evolutionary_optimization`,
which itself imports `anthropic` at module top level. The error
predates this PR (main is in the same state) but blocks the test
matrix on PR #6 too.

Two changes:
1. `pyproject.toml`: add `anthropic` to the `test` extras list.
2. `.github/workflows/tests.yml`: install `pip install -e ".[test]"`
   instead of just `pip install pytest`, so the extras are picked up.

Out of scope here (and not required to make tests green): the
underlying coupling between a top-level `import anthropic` and the
test collection step. A cleaner fix is to lazy-import inside the
function that needs the client, but that's a separate concern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`pip install -e .` fails because setuptools' flat-layout auto-discovery
sees too many top-level dirs (data/, plots/, notes/, docker/, judges/,
apptainer/, experiments/, startup_assets/, startup_scripts/) and
refuses to pick. Configuring explicit `packages` discovery is the
proper long-term fix, but unrelated to this PR.

Tests don't actually need the project installed as a package — they
manipulate sys.path themselves to find local modules. So just install
pytest and anthropic directly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`test_fallback_text_attack_success` was written when
`parse_injection_verdict` had a fallback path that matched verdict
keywords (`attack_success`, `ignored`, `technical`) as substrings in
plain prose if no JSON block was found. That fallback was removed in
favor of JSON-only parsing — the judge prompt is explicit about
returning JSON, and substring-matching led to false positives when
the judge described candidate verdicts in narrative form before
settling on a different one.

The test was reachable on PR #6 only after we fixed the unrelated
`anthropic` import issue; on `main` it was hidden behind that
collection error. Renamed to `test_no_json_returns_ignored` and
flipped the expectation to match current behavior (no JSON →
"ignored", consistent with the verifier's ambiguous-case semantics).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lets an injection record point at a pre-injected SKILL.md (or CLAUDE.md)
file under data/<override_path> and have build_sandbox.py copy it verbatim
into the sandbox, bypassing line-based injection. Required for porting
autoresearch keeps faithfully — the autoresearch meta-agent writes a full
markdown section (heading + numbered steps + authority framing), not just
a single line to inject at a fixed offset.

Adds:
- data/contextual_injections_autoresearch_full.json (21 May-13 keeps)
- data/skill_md_overrides_autoresearch/<cid>/SKILL.md  (per-keep verbatim)
- notes/2026-05-14_porting_autoresearch_keeps_to_contextual_injections.md
  documenting the workflow, measured impact (76% reproducibility vs 19%
  with line-based injection), the bugs we surfaced, and recommended
  autoresearch-loop changes.

Backward-compatible: existing 48 contextual_injections.json records don't
set skill_md_override and continue to use the line-based injection path.
akkikiki added 2 commits May 14, 2026 16:36
…-based variant of the 21 May-13 keeps)

Pure injection-based shape of the same 21 autoresearch keeps that live in
*_full.json. Identical content per record except skill_md_override and
claude_md_override fields are stripped — so build_sandbox.py uses the
classic line-based injection path (description_injection + line_num)
rather than the verbatim-file copy.

Useful for:
- Side-by-side comparison of injection-based vs SKILL.md-rewriting transfer
  fidelity (see notes/2026-05-14_porting_autoresearch_keeps_to_contextual_injections.md
  for measured impact — 19% vs 76% any-success on this same 21-record sample).
- Baseline experiments that want the smaller, more "interpretable"
  JSON-only representation without the per-candidate override file tree.

Layout now:
  contextual_injections.json                                  - canonical 48
  contextual_injections_autoresearch.json                     - 3 hand-picked May-12
  contextual_injections_autoresearch_full.json                - 21 May-13, both fields (uses verbatim)
  contextual_injections_autoresearch_injection_only.json      - same 21, injection-based only
Documents where the autoresearch loop's meta-agent places its injections
in SKILL.md / CLAUDE.md. Pulled from each archived run's
autoresearch_loop.err (per-iteration `placement:` lines).

Headline numbers:
- 71% of 167 logged iterations used INJECTION_PLACEMENT=footer
- 79% combined for trailing-region placements (footer + bottom)
- May-13 run had converged on footer (95% of its 102 iterations)
- Geometric verification: in the 21 verbatim-ported May-13 keeps,
  the injected ## heading lands at 91.8-97.8% of file length (median 96.4%)
- The 4-5 line spacer + --- separator + ## heading is what makes
  the EOF placement look like an authoritative new section

Also captures the May-13 agent's own self-noted finding:
"KEY FINDING: SKILL.md injection channel works; CLAUDE.md injection does NOT work"
— pivoted from CLAUDE.md to SKILL.md mid-run when CLAUDE.md auto-loading
on the Sonnet alias became unreliable.

Implications listed for future autoresearch runs (diversity budget across
placements, pin model snapshots, etc.).
yoshinari-patronus pushed a commit that referenced this pull request May 19, 2026
CI previously installed only `pytest`, which made
`tests/test_evolutionary_optimization.py` error during collection
because it imports `experiments.ablations.evolutionary_optimization`,
which itself imports `anthropic` at module top level. The error
predates this PR (main is in the same state) but blocks the test
matrix on PR #6 too.

Two changes:
1. `pyproject.toml`: add `anthropic` to the `test` extras list.
2. `.github/workflows/tests.yml`: install `pip install -e ".[test]"`
   instead of just `pip install pytest`, so the extras are picked up.

Out of scope here (and not required to make tests green): the
underlying coupling between a top-level `import anthropic` and the
test collection step. A cleaner fix is to lazy-import inside the
function that needs the client, but that's a separate concern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
yoshinari-patronus pushed a commit that referenced this pull request May 19, 2026
CI previously installed only `pytest`, which made
`tests/test_evolutionary_optimization.py` error during collection
because it imports `experiments.ablations.evolutionary_optimization`,
which itself imports `anthropic` at module top level. The error
predates this PR (main is in the same state) but blocks the test
matrix on PR #6 too.

Two changes:
1. `pyproject.toml`: add `anthropic` to the `test` extras list.
2. `.github/workflows/tests.yml`: install `pip install -e ".[test]"`
   instead of just `pip install pytest`, so the extras are picked up.

Out of scope here (and not required to make tests green): the
underlying coupling between a top-level `import anthropic` and the
test collection step. A cleaner fix is to lazy-import inside the
function that needs the client, but that's a separate concern.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
yoshinari-patronus pushed a commit that referenced this pull request May 19, 2026
`test_fallback_text_attack_success` was written when
`parse_injection_verdict` had a fallback path that matched verdict
keywords (`attack_success`, `ignored`, `technical`) as substrings in
plain prose if no JSON block was found. That fallback was removed in
favor of JSON-only parsing — the judge prompt is explicit about
returning JSON, and substring-matching led to false positives when
the judge described candidate verdicts in narrative form before
settling on a different one.

The test was reachable on PR #6 only after we fixed the unrelated
`anthropic` import issue; on `main` it was hidden behind that
collection error. Renamed to `test_no_json_returns_ignored` and
flipped the expectation to match current behavior (no JSON →
"ignored", consistent with the verifier's ambiguous-case semantics).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
yoshinari-patronus pushed a commit that referenced this pull request May 19, 2026
`test_fallback_text_attack_success` was written when
`parse_injection_verdict` had a fallback path that matched verdict
keywords (`attack_success`, `ignored`, `technical`) as substrings in
plain prose if no JSON block was found. That fallback was removed in
favor of JSON-only parsing — the judge prompt is explicit about
returning JSON, and substring-matching led to false positives when
the judge described candidate verdicts in narrative form before
settling on a different one.

The test was reachable on PR #6 only after we fixed the unrelated
`anthropic` import issue; on `main` it was hidden behind that
collection error. Renamed to `test_no_json_returns_ignored` and
flipped the expectation to match current behavior (no JSON →
"ignored", consistent with the verifier's ambiguous-case semantics).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants