E2b backend by Bautista1999 · Pull Request #36 · ankrgyl/exo

Bautista1999 · 2026-06-02T00:14:32Z

E2B sandbox backend

Summary

Adds an E2B sandbox backend so exo can provision and reuse remote sandboxes on E2B the same way it does for Daytona in #22. The backend implements ManagedSandboxBackend and plugs into the existing harness path (ensure_shell_sandbox, run_in_sandbox → try_resume, snapshot restore). This branch is intended to stack on the Daytona work from #22; it does not replace it.

What is included

crates/exoharness/src/e2b.rs — E2bSandboxBackend, lifecycle (create, list/resume by metadata, connect, pause), command execution via envd, snapshots (SnapshotKind::E2bSnapshot).
Wiring in basic.rs, sandbox.rs, executor, and CLI (--sandbox-backend e2b).
Tests:
- crates/cli/tests/e2b_backend.rs — wiremock coverage of API shapes, resume, exec (Connect envelopes), pause, snapshots.
- crates/cli/tests/e2b_resume_harness.rs — cross-process resume: second harness on the same --root must not emit another sandbox_created.
- crates/cli/tests/e2b_live.rs — optional live smoke tests behind #[ignore] (requires E2B_API_KEY).

Cross-process identity uses the same metadata keys as Docker/Daytona (exo.sandbox.key, exo.sandbox.spec-hash). Host bind mounts are rejected; use docker or local backends if you need mounts.

How this differs from Daytona

Both backends sit behind the same exo abstractions. The differences are in what the remote product exposes:

Concern	Daytona (#22)	E2B (this PR)
Control plane	REST on `DAYTONA_API_URL`	REST on `api.e2b.app`
Run command	REST JSON on toolbox (`/toolbox/{id}/process/execute`)	Connect RPC on envd (`/process.Process/Start`, `application/connect+json`)
Image vs template	Snapshot/image oriented; toolbox exec	Agent `sandbox_image` maps to an E2B template id (e.g. `base`)
Stop / reuse	`stop` preserves VM; resume by labels	`pause` on stop; resume via `GET /v2/sandboxes` metadata + `connect` if paused
Snapshots	`DaytonaSnapshot` manifest (snapshot name in Daytona)	`E2bSnapshot` manifest (E2B snapshot template id from `POST .../snapshots`)

Daytona is two HTTP surfaces (API + toolbox). E2B is two as well (platform API + per-sandbox envd). Only the command path uses Connect framing; lifecycle calls are normal REST.

Usage

Build with Rust 1.95+ (same as the rest of the repo).

Environment:

export E2B_API_KEY=...
export E2B_TEMPLATE_ID=base    # optional; defaults to "base"
# optional overrides:
# E2B_API_URL=https://api.e2b.app
# E2B_SECURE=0                 # default in exo; set 1 if sandboxes require envdAccessToken on exec

Run exo with the E2B backend (required on macOS, where the default sandbox backend is still apple-container):

exo --sandbox-backend e2b --root "$EXO_ROOT" repl --agent <agent> --conversation <slug>

The agent should use an E2B template id for sandbox_image (e.g. exo agent create ... --sandbox-image base), not a Docker image reference. Shell tools must be enabled on the conversation/agent as today.

Verify resume across a REPL exit: install something in the sandbox, /quit, start repl again with the same --root, agent, and conversation; the same sandbox_created event should be reused (no second create if resume succeeds). For a clean check, use a new conversation slug so old failed runs with multiple sandbox_created events do not confuse the test.

Tests:

cargo test --package exo --test e2b_backend
cargo test --package exo --test e2b_resume_harness

# optional, bills E2B:
export E2B_API_KEY=...
cargo test --package exo --test e2b_live -- --ignored

Relation to #22

This PR assumes the Daytona sandbox integration from #22 is already in the tree: ManagedSandboxBackend, try_resume, the executor/harness shell sandbox lifecycle, and CLI --sandbox-backend. No new harness-level concepts are introduced; E2B is another SandboxBackendChoice variant alongside Daytona.

Dependencies added for E2B are the same class as Daytona's remote backend: reqwest, base64, and url on exoharness (already pulled in by basic-backend). There is no separate E2B SDK crate; the backend speaks HTTP directly, matching the style of daytona.rs.

Notes for reviewers

Command execution implements Connect (https://connectrpc.com/) envelope encode/decode for envd; plain JSON posts will not work against envd.

Adds ManagedSandboxBackend::try_resume so callers can ask backends to find a sandbox previously created for a given SandboxKey, start it if stopped, attach if running, otherwise return None. Implements it for Docker / AppleContainer via label filter + state inspection + start; LocalProcess returns None. Changes Drop on CliContainerSandboxBackend from `docker rm -f` to `docker stop -t 0` so warm containers survive process exit and the next exo invocation can try_resume them by label. Cross-process idle TTL is enforced lazily in try_resume via `docker inspect .State.FinishedAt`. Wires run_in_sandbox to call try_resume on in-memory cache miss, so a fresh exo process can pick up where the previous one left off. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

When the conversation has a recorded sandbox_id and the spec still matches, walk the resume fallback chain: 1. Tier 1: healthcheck via run_in_sandbox -> backend.try_resume (handles "running -> attach" and "stopped -> start" cases). 2. Tier 2: if the container is truly gone (TTL-expired, manually deleted, server-side auto-cleanup), look up the latest SandboxSnapshotted event for this sandbox and start_sandbox from it. 3. Tier 3: nothing reusable -> create a fresh sandbox. Adds an integration test that runs two cross-process `chat send` invocations against the same conversation and asserts exactly one docker container exists afterwards (resume reused it, didn't create a new one). Updates the existing integration test's container-state assertion to reflect the new stop-on-exit behavior. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Three new integration tests, one per tier of the 3-tier fallback chain in ensure_shell_sandbox: tier_1_stopped_container_is_resumed_same_id Drop the harness (PR ankrgyl#21's Drop stops, doesn't rm). Container survives on the host in Exited state. Second harness's try_resume finds it by label, docker-starts it, attaches. Same container ID, same sandbox_id, marker file persists across the stop/start cycle. tier_2_gone_container_with_snapshot_restores First harness takes a snapshot of the live sandbox (PR ankrgyl#20 API). Drop the harness; `docker rm -f` the container (simulates idle-TTL expiry / external cleanup). Second harness's try_resume misses, falls through to Tier 2, finds the snapshot in the event log, and calls start_sandbox -> acquire_from_snapshot. A NEW container id is materialised, but the sandbox_id is reused and the marker is restored from the snapshot — proving the snapshot path actually fires, not just resume. tier_3_gone_container_without_snapshot_creates_fresh Same setup as tier 2 minus the snapshot. Second harness misses Tier 1 (no container) and Tier 2 (no snapshot), so falls through to create_sandbox. A new sandbox_id is generated; the conversation log now has two SandboxCreated events; the previous marker is gone from the fresh container. Each test simulates the "two exo processes" boundary by dropping the first BasicExoHarness and constructing a new one from the same root dir. Library-API driven (no LLM mock, no binary spawn) — the harness's 3-tier behaviour is the only thing under test here. Wired into integration.yml as a third --test target alongside integration_chat and snapshot_round_trip. Self-skips on non-docker matrix cells via preflight(). All three pass locally in ~3s against real Docker; self-skip path runs in 50ms.

- docker ps: switch from tab-template to --format '{{json .}}' + serde (DockerPsItem + parse_docker_labels). Same swap in the integration test helper. Unit test for the label parser. - introduce exoharness::first_matching_event helper for the recurring 'find the latest event of kind K, decoded via predicate P' pattern. Three callers refactored: latest_snapshot_for_sandbox (the one in the comment), latest_shell_sandbox, and tui::latest_sandbox_id. - fix integration test: cross_process_send_resumes_the_same_sandbox_container was still using the pre-rebase 'chat send' subcommand. Main's PR ankrgyl#4 renamed it to 'conversation send'. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Adds a Daytona-cloud-backed implementation of `ManagedSandboxBackend`. REST client lives in `crates/exoharness/src/daytona.rs` and speaks Daytona's API directly via `reqwest`. Lifecycle model: - `acquire` -> POST /api/sandbox, returns a fresh sandbox. - `try_resume` -> label-based lookup against Daytona's control plane, `start` if stopped. This is what makes cross-process resume work for free on Daytona: `stop()` preserves filesystem state server-side, so the next exo invocation finds the labelled sandbox and resumes it. - `acquire_from_snapshot` -> POST /api/sandbox with `snapshot: <name>`. - `snapshot` -> save-as-snapshot. The payload bytes are a small JSON manifest pointing at the named Daytona snapshot, not the filesystem itself; bytes-by-reference, not bytes-by-value. Adds `SnapshotKind::DaytonaSnapshot` and an explicit kind-mismatch error in the CLI backend so a Daytona-produced snapshot under `--sandbox-backend docker` fails clearly. Config via env: `DAYTONA_API_KEY` (required), `DAYTONA_API_URL`, `DAYTONA_TOOLBOX_URL`, `DAYTONA_TARGET`, `DAYTONA_ORGANIZATION_ID`. Exec, fs, and per-sandbox operations go through Daytona's toolbox proxy (`proxy.app.daytona.io/toolbox/<id>/...`) rather than the control plane. Mounts: rejected up front. Daytona has no host filesystem to bind into; a follow-up workspace provisioner (git clone or Daytona Volume) will own the "where does the workspace come from" question. Streaming `start_process`: synchronous-exec bridge for now (REST exec isn't streaming-shaped). Buffers stdout/stderr into AsyncRead-friendly cursors so the rest of the harness is unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Wires the new Daytona backend through the executor re-exports and the exo CLI. Adds the `daytona` value to `--sandbox-backend` (env var: `EXO_SANDBOX_BACKEND=daytona`). Config is read from environment via `DaytonaConfig::from_env()` at backend construction time. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Adds crates/cli/tests/daytona_backend.rs — 11 tests covering each public method on DaytonaSandboxBackend by standing up an in-process wiremock server that pretends to be Daytona's REST API. Strategy: don't hit real Daytona in CI. It would require credentials, network egress, real sandbox provisioning ($), and cleanup that's fragile if a test panics mid-flight. Mocks let us assert on the things that actually matter for this layer: - which endpoint is hit (control plane vs the separate toolbox host) - what the request body/query shape looks like (translating SandboxRequest etc. into Daytona's wire format) - how the backend interprets canned responses (state-machine transitions, payload encoding) Drift between mock and reality is a real risk but bounded by the PR description's documentation of the actual API shapes. Catching drift is the job of the live test plan in the PR; catching code defects in the translation layer is the job of these mock tests. Coverage: acquire_posts_to_sandbox_endpoint_with_labels Verifies POST /sandbox carries the two warm-sandbox labels and does NOT set `snapshot` on a fresh acquire (Daytona would interpret that as a pre-registered named snapshot, which it isn't). acquire_rejects_host_mounts Daytona has no host filesystem to bind-mount; the backend must reject mounts up front, before hitting the API. try_resume_finds_running_sandbox_without_starting try_resume_starts_stopped_sandbox try_resume_returns_none_when_no_match try_resume_filters_by_label_as_single_json_query_param The four cases of the cross-process resume lookup. The label-query- shape test in particular catches the easy-to-get-wrong case of sending `?labels=k=v&labels=k=v` instead of one JSON-encoded `?labels={...}`. stop_calls_stop_endpoint_not_delete Asserts the resume contract: stop must NOT DELETE, since the next process needs to be able to resume by label. snapshot_returns_daytona_snapshot_payload_with_manifest Verifies the payload kind is DaytonaSnapshot (not DockerImageTar) and that the manifest carries both snapshot_name and base_image. acquire_from_snapshot_passes_snapshot_name_in_create_body Verifies the inverse: when restoring, the manifest's snapshot_name threads into Daytona's create-body `snapshot` field. acquire_from_snapshot_rejects_wrong_kind A DockerImageTar payload handed to the Daytona backend must error cleanly — and must NOT hit the API in the meantime. exec_uses_toolbox_url_not_api_url The Daytona quirk that per-sandbox operations live on a different host (`proxy.app.daytona.io/toolbox/...`) than the control plane (`app.daytona.io/api`). Routing the exec call to the wrong host would 404 in production but `404` against the same wiremock host in tests, so we additionally assert the path the request hit. Not #[ignore]'d, no docker required — these run in regular per-PR CI via `cargo test --workspace`. Workspace test count goes from 51 to 62.

- drop unused toolbox_endpoint method on DaytonaSandboxBackend - collapse redundant |ttl| idle_ttl_to_minutes(ttl) closure - rustfmt nits from main's PR ankrgyl#26 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…f command execution protocol) Daytona uses plain REST json api, but for command exec in an E2B sandbox it uses Connect Protocol over http (https://connectrpc.com/) which is a communication protocol to request command execution to the environment daemon of the remote sandbox. To make this happen, we encode the request and send it through this protocol.

akrentsel and others added 18 commits June 1, 2026 02:16

daytona: clippy + fmt fixups for main's stricter CI

9d75b32

- drop unused toolbox_endpoint method on DaytonaSandboxBackend - collapse redundant |ttl| idle_ttl_to_minutes(ttl) closure - rustfmt nits from main's PR ankrgyl#26 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Adding E2B to sandbox.rs

b01df46

exposing E2B parameters

3328e6e

Exposing E2B as an option for sandbox integration

88dc48b

Update lib.rs

6470583

Update Cargo.toml

f8d7110

unit tests for E2B integration: both mocked and non-mocked

3e3dbfc

Update Cargo.toml

ad5e842

Update Cargo.lock

60c0b93

Exposing E2B for unit tests

8c103be

Bautista1999 mentioned this pull request Jun 2, 2026

Sprites backend - 2 #40

Open

akrentsel force-pushed the daytona-backend-v2 branch from 9d75b32 to 02292c3 Compare June 3, 2026 03:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

E2b backend#36

E2b backend#36
Bautista1999 wants to merge 18 commits into
ankrgyl:daytona-backend-v2from
Bautista1999:e2b-backend

Bautista1999 commented Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Bautista1999 commented Jun 2, 2026

E2B sandbox backend

Summary

What is included

How this differs from Daytona

Usage

Relation to #22

Notes for reviewers

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants