fix(bedrock): fail fast for Bedrock models on unreachable remote sandboxes#329
Conversation
…boxes PR #327 fixed `ensure_usage_proxy_runtime` to skip the host-side usage telemetry proxy on remote sandboxes (daytona/modal) where the host's 127.0.0.1 is unreachable from the agent. Its structural twin, `ensure_bedrock_proxy_runtime`, was not patched: a Bedrock-routed model on daytona/modal still injected an unreachable 127.0.0.1 base URL and failed mid-run with an opaque `ACP error -32603 ... ECONNREFUSED`. Unlike the usage proxy (pure telemetry, safe to skip), the Bedrock proxy is load-bearing: it translates Anthropic/OpenAI requests into AWS Bedrock Converse calls and signs them with host AWS credentials. There is no path for the agent to reach Bedrock without it, so silently skipping would just move the failure. Instead `ensure_bedrock_proxy_runtime` now gates on `host_proxy_reachable_from_agent` and raises a clear, actionable error at setup time ("Bedrock-routed models are not supported on the 'daytona' sandbox ... re-run with --sandbox docker, or select a non-Bedrock model"). Consolidation: `host_proxy_reachable_from_agent` is now the single source of truth for "is the host reachable from the agent". `_bedrock_proxy_command` asserts the precondition and is only ever reached for reachable environments. Honest allow-list: `_LOCAL_REACHABLE_ENVIRONMENTS = {docker,local,host,""}` is replaced by `_REMOTE_UNREACHABLE_ENVIRONMENTS = {daytona,modal}`. The phantom values local/host/"" were never produced by the runtime (canonical env set is {docker,daytona,modal}); unknown environments are now treated conservatively as reachable. Adds regression tests covering fail-fast on daytona/modal, the actionable error text, non-Bedrock noop, stale-runtime teardown, and the _bedrock_proxy_command precondition.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8802c894fe
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| """The Bedrock proxy is load-bearing; on a remote sandbox where the host | ||
| proxy is unreachable the run must fail fast rather than inject an | ||
| unreachable 127.0.0.1 base URL (the Daytona telemetry-proxy twin bug).""" |
There was a problem hiding this comment.
Include guarded PR/commit in regression test docstring
The new regression test class docstring describes the Daytona/Modal Bedrock proxy regression but does not name the PR or commit it guards, which violates the repository rule in /workspace/benchflow/AGENTS.md (Regression tests must name the PR/commit they guard). This makes future triage and intentional test-retirement decisions harder because maintainers cannot quickly map the test to its historical regression source.
Useful? React with 👍 / 👎.
| assert host_proxy_reachable_from_agent("docker") is True | ||
| assert host_proxy_reachable_from_agent("local") is True | ||
| assert host_proxy_reachable_from_agent("host") is True | ||
| assert host_proxy_reachable_from_agent("") is True | ||
| # Remote cloud sandboxes run the agent on a separate machine. | ||
| assert host_proxy_reachable_from_agent("daytona") is False | ||
| assert host_proxy_reachable_from_agent("modal") is False | ||
| # Unrecognized environments are treated conservatively as reachable. | ||
| assert host_proxy_reachable_from_agent("") is True | ||
| assert host_proxy_reachable_from_agent("some-future-local-env") is True |
There was a problem hiding this comment.
🔴 Passing test assertions for "local" and "host" environments removed in violation of AGENTS.md
The AGENTS.md rule states: "Don't rewrite passing tests to match new behavior." The test test_host_proxy_reachable_only_for_local_environments removed the assertions assert host_proxy_reachable_from_agent("local") is True and assert host_proxy_reachable_from_agent("host") is True. These assertions still pass under the new blacklist-based implementation ("local" and "host" are not in _REMOTE_UNREACHABLE_ENVIRONMENTS at src/benchflow/providers/runtime.py:123, so host_proxy_reachable_from_agent returns True for both). The assertions were passing tests that were removed, which violates the repository convention.
| assert host_proxy_reachable_from_agent("docker") is True | |
| assert host_proxy_reachable_from_agent("local") is True | |
| assert host_proxy_reachable_from_agent("host") is True | |
| assert host_proxy_reachable_from_agent("") is True | |
| # Remote cloud sandboxes run the agent on a separate machine. | |
| assert host_proxy_reachable_from_agent("daytona") is False | |
| assert host_proxy_reachable_from_agent("modal") is False | |
| # Unrecognized environments are treated conservatively as reachable. | |
| assert host_proxy_reachable_from_agent("") is True | |
| assert host_proxy_reachable_from_agent("some-future-local-env") is True | |
| # docker shares the host's network namespace via the docker bridge. | |
| assert host_proxy_reachable_from_agent("docker") is True | |
| assert host_proxy_reachable_from_agent("local") is True | |
| assert host_proxy_reachable_from_agent("host") is True | |
| # Remote cloud sandboxes run the agent on a separate machine. | |
| assert host_proxy_reachable_from_agent("daytona") is False | |
| assert host_proxy_reachable_from_agent("modal") is False | |
| # Unrecognized environments are treated conservatively as reachable. | |
| assert host_proxy_reachable_from_agent("") is True | |
| assert host_proxy_reachable_from_agent("some-future-local-env") is True |
Was this helpful? React with 👍 or 👎 to provide feedback.
The twin bug
PR #327 fixed
ensure_usage_proxy_runtimeso the host-side usage telemetry proxy is skipped on remote sandboxes (daytona/modal) where the host's127.0.0.1is unreachable from the agent. Butsrc/benchflow/providers/runtime.pyhas a second, structurally identical host-side proxy entry point —ensure_bedrock_proxy_runtime— that PR #327 did not patch.It builds
ProviderRuntime(kind="aws-bedrock", host=_bedrock_proxy_command(environment=environment), ...), and_bedrock_proxy_commandreturnsBEDROCK_PROXY_LOCAL_HOST = "127.0.0.1"for any non-dockerenvironment. So a Bedrock-routed model (needs_provider_runtime(model) is True) on a Daytona or Modal sandbox injected an unreachable127.0.0.1base URL into the agent and failed with the same opaqueACP error -32603 ... ECONNREFUSEDmid-run.Chosen behavior for Bedrock-on-remote: fail fast (option b) — and why
Unlike the usage proxy (pure telemetry — safe to skip), the Bedrock proxy is load-bearing. Reading
bedrock_proxy.py: it translates Anthropic Messages / OpenAI Responses requests into AWS Bedrock Converse API calls and signs them with host AWS credentials (AWS_BEARER_TOKEN_BEDROCK,AWS_REGION) pulled fromruntime_env. There is no path for the agent to reach Bedrock without it — silently skipping would just move the failure (the agent would get no base-URL rewrite and try to talk Bedrock-style endpoints, or hit the default provider URL with a Bedrock model name).So
ensure_bedrock_proxy_runtimenow gates on thehost_proxy_reachable_from_agent(environment)predicate (added by #327) and raises a clear, actionableRuntimeErrorat setup time instead of injecting an unreachable URL:Any stale runtime carried over from a prior role/env is stopped before raising.
Consolidation (verifier Finding 2)
_bedrock_proxy_commandandhost_proxy_reachable_from_agentboth encoded "is the host reachable from the agent."host_proxy_reachable_from_agentis now the single source of truth: it is the gate, and_bedrock_proxy_commandasserts the precondition (host_proxy_reachable_from_agent(environment)is True) so it is only ever reached for environments already known reachable.Honest allow-list (verifier Finding 3)
_LOCAL_REACHABLE_ENVIRONMENTS = {"docker","local","host",""}is replaced by_REMOTE_UNREACHABLE_ENVIRONMENTS = {"daytona","modal"}. The phantom valueslocal/host/""were never produced by the runtime — the canonical environment set is{docker, daytona, modal}. The predicate is now "unreachable iff in the explicit remote set"; unknown environments are treated conservatively as reachable (still wire up the proxy rather than silently skip), with an explicit comment.Tests
tests/test_provider_runtime.py— newTestBedrockProxyRemoteSandbox: fail-fast on daytona/modal, actionable error text, non-Bedrock noop on remote, stale-runtime teardown,_bedrock_proxy_commandprecondition assertion.tests/test_usage_proxy.py— reachability test updated to the honest domain.CI gate all green:
ruff format --check,ruff check,ty check src/,pytest tests/(1307 passed, 27 skipped).Note
Medium Risk
Changes Bedrock model startup behavior to raise a hard error on
daytona/modalinstead of attempting to use an unreachable host proxy, which may break previously (already failing) remote-sandbox runs but improves diagnosability. Touches provider-runtime wiring used during rollout connect, so regressions would block Bedrock-routed executions.Overview
Bedrock-routed models now fail fast on remote sandboxes.
ensure_bedrock_proxy_runtimecheckshost_proxy_reachable_from_agentand raises an actionableRuntimeErrorondaytona/modal(stopping any stale runtime first) rather than injecting an unreachable127.0.0.1base URL.Reachability logic is tightened and centralized. The reachability predicate is inverted to an explicit remote-unreachable set (
daytona,modal), treats unknown environments as reachable, and_bedrock_proxy_commandnow asserts it’s only called when the host proxy is reachable.Tests add coverage for the new fail-fast behavior/actionable messaging, stale runtime cleanup, and updated reachability expectations.
Reviewed by Cursor Bugbot for commit 8802c89. Bugbot is set up for automated code reviews on this repo. Configure here.