E2b backend#36
Open
Bautista1999 wants to merge 18 commits into
Open
Conversation
Adds ManagedSandboxBackend::try_resume so callers can ask backends to find a sandbox previously created for a given SandboxKey, start it if stopped, attach if running, otherwise return None. Implements it for Docker / AppleContainer via label filter + state inspection + start; LocalProcess returns None. Changes Drop on CliContainerSandboxBackend from `docker rm -f` to `docker stop -t 0` so warm containers survive process exit and the next exo invocation can try_resume them by label. Cross-process idle TTL is enforced lazily in try_resume via `docker inspect .State.FinishedAt`. Wires run_in_sandbox to call try_resume on in-memory cache miss, so a fresh exo process can pick up where the previous one left off. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When the conversation has a recorded sandbox_id and the spec still
matches, walk the resume fallback chain:
1. Tier 1: healthcheck via run_in_sandbox -> backend.try_resume
(handles "running -> attach" and "stopped -> start" cases).
2. Tier 2: if the container is truly gone (TTL-expired, manually
deleted, server-side auto-cleanup), look up the latest
SandboxSnapshotted event for this sandbox and start_sandbox from
it.
3. Tier 3: nothing reusable -> create a fresh sandbox.
Adds an integration test that runs two cross-process `chat send`
invocations against the same conversation and asserts exactly one
docker container exists afterwards (resume reused it, didn't create
a new one). Updates the existing integration test's container-state
assertion to reflect the new stop-on-exit behavior.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three new integration tests, one per tier of the 3-tier fallback chain
in ensure_shell_sandbox:
tier_1_stopped_container_is_resumed_same_id
Drop the harness (PR ankrgyl#21's Drop stops, doesn't rm). Container
survives on the host in Exited state. Second harness's try_resume
finds it by label, docker-starts it, attaches. Same container ID,
same sandbox_id, marker file persists across the stop/start cycle.
tier_2_gone_container_with_snapshot_restores
First harness takes a snapshot of the live sandbox (PR ankrgyl#20 API).
Drop the harness; `docker rm -f` the container (simulates idle-TTL
expiry / external cleanup). Second harness's try_resume misses,
falls through to Tier 2, finds the snapshot in the event log, and
calls start_sandbox -> acquire_from_snapshot. A NEW container id is
materialised, but the sandbox_id is reused and the marker is
restored from the snapshot — proving the snapshot path actually
fires, not just resume.
tier_3_gone_container_without_snapshot_creates_fresh
Same setup as tier 2 minus the snapshot. Second harness misses
Tier 1 (no container) and Tier 2 (no snapshot), so falls through
to create_sandbox. A new sandbox_id is generated; the conversation
log now has two SandboxCreated events; the previous marker is gone
from the fresh container.
Each test simulates the "two exo processes" boundary by dropping the
first BasicExoHarness and constructing a new one from the same root
dir. Library-API driven (no LLM mock, no binary spawn) — the harness's
3-tier behaviour is the only thing under test here.
Wired into integration.yml as a third --test target alongside
integration_chat and snapshot_round_trip. Self-skips on non-docker
matrix cells via preflight().
All three pass locally in ~3s against real Docker; self-skip path
runs in 50ms.
- docker ps: switch from tab-template to --format '{{json .}}' + serde
(DockerPsItem + parse_docker_labels). Same swap in the integration
test helper. Unit test for the label parser.
- introduce exoharness::first_matching_event helper for the recurring
'find the latest event of kind K, decoded via predicate P' pattern.
Three callers refactored: latest_snapshot_for_sandbox (the one in
the comment), latest_shell_sandbox, and tui::latest_sandbox_id.
- fix integration test: cross_process_send_resumes_the_same_sandbox_container
was still using the pre-rebase 'chat send' subcommand. Main's PR ankrgyl#4
renamed it to 'conversation send'.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds a Daytona-cloud-backed implementation of `ManagedSandboxBackend`. REST client lives in `crates/exoharness/src/daytona.rs` and speaks Daytona's API directly via `reqwest`. Lifecycle model: - `acquire` -> POST /api/sandbox, returns a fresh sandbox. - `try_resume` -> label-based lookup against Daytona's control plane, `start` if stopped. This is what makes cross-process resume work for free on Daytona: `stop()` preserves filesystem state server-side, so the next exo invocation finds the labelled sandbox and resumes it. - `acquire_from_snapshot` -> POST /api/sandbox with `snapshot: <name>`. - `snapshot` -> save-as-snapshot. The payload bytes are a small JSON manifest pointing at the named Daytona snapshot, not the filesystem itself; bytes-by-reference, not bytes-by-value. Adds `SnapshotKind::DaytonaSnapshot` and an explicit kind-mismatch error in the CLI backend so a Daytona-produced snapshot under `--sandbox-backend docker` fails clearly. Config via env: `DAYTONA_API_KEY` (required), `DAYTONA_API_URL`, `DAYTONA_TOOLBOX_URL`, `DAYTONA_TARGET`, `DAYTONA_ORGANIZATION_ID`. Exec, fs, and per-sandbox operations go through Daytona's toolbox proxy (`proxy.app.daytona.io/toolbox/<id>/...`) rather than the control plane. Mounts: rejected up front. Daytona has no host filesystem to bind into; a follow-up workspace provisioner (git clone or Daytona Volume) will own the "where does the workspace come from" question. Streaming `start_process`: synchronous-exec bridge for now (REST exec isn't streaming-shaped). Buffers stdout/stderr into AsyncRead-friendly cursors so the rest of the harness is unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Wires the new Daytona backend through the executor re-exports and the exo CLI. Adds the `daytona` value to `--sandbox-backend` (env var: `EXO_SANDBOX_BACKEND=daytona`). Config is read from environment via `DaytonaConfig::from_env()` at backend construction time. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds crates/cli/tests/daytona_backend.rs — 11 tests covering each
public method on DaytonaSandboxBackend by standing up an in-process
wiremock server that pretends to be Daytona's REST API.
Strategy: don't hit real Daytona in CI. It would require credentials,
network egress, real sandbox provisioning ($), and cleanup that's
fragile if a test panics mid-flight. Mocks let us assert on the things
that actually matter for this layer:
- which endpoint is hit (control plane vs the separate toolbox host)
- what the request body/query shape looks like (translating
SandboxRequest etc. into Daytona's wire format)
- how the backend interprets canned responses (state-machine
transitions, payload encoding)
Drift between mock and reality is a real risk but bounded by the PR
description's documentation of the actual API shapes. Catching drift
is the job of the live test plan in the PR; catching code defects in
the translation layer is the job of these mock tests.
Coverage:
acquire_posts_to_sandbox_endpoint_with_labels
Verifies POST /sandbox carries the two warm-sandbox labels and
does NOT set `snapshot` on a fresh acquire (Daytona would
interpret that as a pre-registered named snapshot, which it isn't).
acquire_rejects_host_mounts
Daytona has no host filesystem to bind-mount; the backend must
reject mounts up front, before hitting the API.
try_resume_finds_running_sandbox_without_starting
try_resume_starts_stopped_sandbox
try_resume_returns_none_when_no_match
try_resume_filters_by_label_as_single_json_query_param
The four cases of the cross-process resume lookup. The label-query-
shape test in particular catches the easy-to-get-wrong case of
sending `?labels=k=v&labels=k=v` instead of one JSON-encoded
`?labels={...}`.
stop_calls_stop_endpoint_not_delete
Asserts the resume contract: stop must NOT DELETE, since the next
process needs to be able to resume by label.
snapshot_returns_daytona_snapshot_payload_with_manifest
Verifies the payload kind is DaytonaSnapshot (not DockerImageTar)
and that the manifest carries both snapshot_name and base_image.
acquire_from_snapshot_passes_snapshot_name_in_create_body
Verifies the inverse: when restoring, the manifest's snapshot_name
threads into Daytona's create-body `snapshot` field.
acquire_from_snapshot_rejects_wrong_kind
A DockerImageTar payload handed to the Daytona backend must error
cleanly — and must NOT hit the API in the meantime.
exec_uses_toolbox_url_not_api_url
The Daytona quirk that per-sandbox operations live on a different
host (`proxy.app.daytona.io/toolbox/...`) than the control plane
(`app.daytona.io/api`). Routing the exec call to the wrong host
would 404 in production but `404` against the same wiremock host
in tests, so we additionally assert the path the request hit.
Not #[ignore]'d, no docker required — these run in regular per-PR CI
via `cargo test --workspace`. Workspace test count goes from 51 to 62.
- drop unused toolbox_endpoint method on DaytonaSandboxBackend - collapse redundant |ttl| idle_ttl_to_minutes(ttl) closure - rustfmt nits from main's PR ankrgyl#26 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…f command execution protocol) Daytona uses plain REST json api, but for command exec in an E2B sandbox it uses Connect Protocol over http (https://connectrpc.com/) which is a communication protocol to request command execution to the environment daemon of the remote sandbox. To make this happen, we encode the request and send it through this protocol.
9d75b32 to
02292c3
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
E2B sandbox backend
Summary
Adds an E2B sandbox backend so exo can provision and reuse remote sandboxes on E2B the same way it does for Daytona in #22. The backend implements
ManagedSandboxBackendand plugs into the existing harness path (ensure_shell_sandbox,run_in_sandbox→try_resume, snapshot restore). This branch is intended to stack on the Daytona work from #22; it does not replace it.What is included
crates/exoharness/src/e2b.rs—E2bSandboxBackend, lifecycle (create, list/resume by metadata, connect, pause), command execution via envd, snapshots (SnapshotKind::E2bSnapshot).basic.rs,sandbox.rs,executor, and CLI (--sandbox-backend e2b).crates/cli/tests/e2b_backend.rs— wiremock coverage of API shapes, resume, exec (Connect envelopes), pause, snapshots.crates/cli/tests/e2b_resume_harness.rs— cross-process resume: second harness on the same--rootmust not emit anothersandbox_created.crates/cli/tests/e2b_live.rs— optional live smoke tests behind#[ignore](requiresE2B_API_KEY).Cross-process identity uses the same metadata keys as Docker/Daytona (
exo.sandbox.key,exo.sandbox.spec-hash). Host bind mounts are rejected; use docker or local backends if you need mounts.How this differs from Daytona
Both backends sit behind the same exo abstractions. The differences are in what the remote product exposes:
DAYTONA_API_URLapi.e2b.app/toolbox/{id}/process/execute)/process.Process/Start,application/connect+json)sandbox_imagemaps to an E2B template id (e.g.base)stoppreserves VM; resume by labelspauseon stop; resume viaGET /v2/sandboxesmetadata +connectif pausedDaytonaSnapshotmanifest (snapshot name in Daytona)E2bSnapshotmanifest (E2B snapshot template id fromPOST .../snapshots)Daytona is two HTTP surfaces (API + toolbox). E2B is two as well (platform API + per-sandbox envd). Only the command path uses Connect framing; lifecycle calls are normal REST.
Usage
Build with Rust 1.95+ (same as the rest of the repo).
Environment:
Run exo with the E2B backend (required on macOS, where the default sandbox backend is still apple-container):
The agent should use an E2B template id for
sandbox_image(e.g.exo agent create ... --sandbox-image base), not a Docker image reference. Shell tools must be enabled on the conversation/agent as today.Verify resume across a REPL exit: install something in the sandbox,
/quit, startreplagain with the same--root, agent, and conversation; the samesandbox_createdevent should be reused (no second create if resume succeeds). For a clean check, use a new conversation slug so old failed runs with multiplesandbox_createdevents do not confuse the test.Tests:
Relation to #22
This PR assumes the Daytona sandbox integration from #22 is already in the tree:
ManagedSandboxBackend,try_resume, the executor/harness shell sandbox lifecycle, and CLI--sandbox-backend. No new harness-level concepts are introduced; E2B is anotherSandboxBackendChoicevariant alongside Daytona.Dependencies added for E2B are the same class as Daytona's remote backend:
reqwest,base64, andurlonexoharness(already pulled in bybasic-backend). There is no separate E2B SDK crate; the backend speaks HTTP directly, matching the style ofdaytona.rs.Notes for reviewers