Skip to content

E2b backend#36

Open
Bautista1999 wants to merge 18 commits into
ankrgyl:daytona-backend-v2from
Bautista1999:e2b-backend
Open

E2b backend#36
Bautista1999 wants to merge 18 commits into
ankrgyl:daytona-backend-v2from
Bautista1999:e2b-backend

Conversation

@Bautista1999
Copy link
Copy Markdown

E2B sandbox backend

Summary

Adds an E2B sandbox backend so exo can provision and reuse remote sandboxes on E2B the same way it does for Daytona in #22. The backend implements ManagedSandboxBackend and plugs into the existing harness path (ensure_shell_sandbox, run_in_sandboxtry_resume, snapshot restore). This branch is intended to stack on the Daytona work from #22; it does not replace it.

What is included

  • crates/exoharness/src/e2b.rsE2bSandboxBackend, lifecycle (create, list/resume by metadata, connect, pause), command execution via envd, snapshots (SnapshotKind::E2bSnapshot).
  • Wiring in basic.rs, sandbox.rs, executor, and CLI (--sandbox-backend e2b).
  • Tests:
    • crates/cli/tests/e2b_backend.rs — wiremock coverage of API shapes, resume, exec (Connect envelopes), pause, snapshots.
    • crates/cli/tests/e2b_resume_harness.rs — cross-process resume: second harness on the same --root must not emit another sandbox_created.
    • crates/cli/tests/e2b_live.rs — optional live smoke tests behind #[ignore] (requires E2B_API_KEY).

Cross-process identity uses the same metadata keys as Docker/Daytona (exo.sandbox.key, exo.sandbox.spec-hash). Host bind mounts are rejected; use docker or local backends if you need mounts.

How this differs from Daytona

Both backends sit behind the same exo abstractions. The differences are in what the remote product exposes:

Concern Daytona (#22) E2B (this PR)
Control plane REST on DAYTONA_API_URL REST on api.e2b.app
Run command REST JSON on toolbox (/toolbox/{id}/process/execute) Connect RPC on envd (/process.Process/Start, application/connect+json)
Image vs template Snapshot/image oriented; toolbox exec Agent sandbox_image maps to an E2B template id (e.g. base)
Stop / reuse stop preserves VM; resume by labels pause on stop; resume via GET /v2/sandboxes metadata + connect if paused
Snapshots DaytonaSnapshot manifest (snapshot name in Daytona) E2bSnapshot manifest (E2B snapshot template id from POST .../snapshots)

Daytona is two HTTP surfaces (API + toolbox). E2B is two as well (platform API + per-sandbox envd). Only the command path uses Connect framing; lifecycle calls are normal REST.

Usage

Build with Rust 1.95+ (same as the rest of the repo).

Environment:

export E2B_API_KEY=...
export E2B_TEMPLATE_ID=base    # optional; defaults to "base"
# optional overrides:
# E2B_API_URL=https://api.e2b.app
# E2B_SECURE=0                 # default in exo; set 1 if sandboxes require envdAccessToken on exec

Run exo with the E2B backend (required on macOS, where the default sandbox backend is still apple-container):

exo --sandbox-backend e2b --root "$EXO_ROOT" repl --agent <agent> --conversation <slug>

The agent should use an E2B template id for sandbox_image (e.g. exo agent create ... --sandbox-image base), not a Docker image reference. Shell tools must be enabled on the conversation/agent as today.

Verify resume across a REPL exit: install something in the sandbox, /quit, start repl again with the same --root, agent, and conversation; the same sandbox_created event should be reused (no second create if resume succeeds). For a clean check, use a new conversation slug so old failed runs with multiple sandbox_created events do not confuse the test.

Tests:

cargo test --package exo --test e2b_backend
cargo test --package exo --test e2b_resume_harness

# optional, bills E2B:
export E2B_API_KEY=...
cargo test --package exo --test e2b_live -- --ignored

Relation to #22

This PR assumes the Daytona sandbox integration from #22 is already in the tree: ManagedSandboxBackend, try_resume, the executor/harness shell sandbox lifecycle, and CLI --sandbox-backend. No new harness-level concepts are introduced; E2B is another SandboxBackendChoice variant alongside Daytona.

Dependencies added for E2B are the same class as Daytona's remote backend: reqwest, base64, and url on exoharness (already pulled in by basic-backend). There is no separate E2B SDK crate; the backend speaks HTTP directly, matching the style of daytona.rs.

Notes for reviewers

  • Command execution implements Connect (https://connectrpc.com/) envelope encode/decode for envd; plain JSON posts will not work against envd.

akrentsel and others added 18 commits June 1, 2026 02:16
Adds ManagedSandboxBackend::try_resume so callers can ask backends to
find a sandbox previously created for a given SandboxKey, start it if
stopped, attach if running, otherwise return None. Implements it for
Docker / AppleContainer via label filter + state inspection + start;
LocalProcess returns None.

Changes Drop on CliContainerSandboxBackend from `docker rm -f` to
`docker stop -t 0` so warm containers survive process exit and the next
exo invocation can try_resume them by label. Cross-process idle TTL is
enforced lazily in try_resume via `docker inspect .State.FinishedAt`.

Wires run_in_sandbox to call try_resume on in-memory cache miss, so a
fresh exo process can pick up where the previous one left off.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When the conversation has a recorded sandbox_id and the spec still
matches, walk the resume fallback chain:

  1. Tier 1: healthcheck via run_in_sandbox -> backend.try_resume
     (handles "running -> attach" and "stopped -> start" cases).
  2. Tier 2: if the container is truly gone (TTL-expired, manually
     deleted, server-side auto-cleanup), look up the latest
     SandboxSnapshotted event for this sandbox and start_sandbox from
     it.
  3. Tier 3: nothing reusable -> create a fresh sandbox.

Adds an integration test that runs two cross-process `chat send`
invocations against the same conversation and asserts exactly one
docker container exists afterwards (resume reused it, didn't create
a new one). Updates the existing integration test's container-state
assertion to reflect the new stop-on-exit behavior.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three new integration tests, one per tier of the 3-tier fallback chain
in ensure_shell_sandbox:

  tier_1_stopped_container_is_resumed_same_id
    Drop the harness (PR ankrgyl#21's Drop stops, doesn't rm). Container
    survives on the host in Exited state. Second harness's try_resume
    finds it by label, docker-starts it, attaches. Same container ID,
    same sandbox_id, marker file persists across the stop/start cycle.

  tier_2_gone_container_with_snapshot_restores
    First harness takes a snapshot of the live sandbox (PR ankrgyl#20 API).
    Drop the harness; `docker rm -f` the container (simulates idle-TTL
    expiry / external cleanup). Second harness's try_resume misses,
    falls through to Tier 2, finds the snapshot in the event log, and
    calls start_sandbox -> acquire_from_snapshot. A NEW container id is
    materialised, but the sandbox_id is reused and the marker is
    restored from the snapshot — proving the snapshot path actually
    fires, not just resume.

  tier_3_gone_container_without_snapshot_creates_fresh
    Same setup as tier 2 minus the snapshot. Second harness misses
    Tier 1 (no container) and Tier 2 (no snapshot), so falls through
    to create_sandbox. A new sandbox_id is generated; the conversation
    log now has two SandboxCreated events; the previous marker is gone
    from the fresh container.

Each test simulates the "two exo processes" boundary by dropping the
first BasicExoHarness and constructing a new one from the same root
dir. Library-API driven (no LLM mock, no binary spawn) — the harness's
3-tier behaviour is the only thing under test here.

Wired into integration.yml as a third --test target alongside
integration_chat and snapshot_round_trip. Self-skips on non-docker
matrix cells via preflight().

All three pass locally in ~3s against real Docker; self-skip path
runs in 50ms.
- docker ps: switch from tab-template to --format '{{json .}}' + serde
  (DockerPsItem + parse_docker_labels). Same swap in the integration
  test helper. Unit test for the label parser.

- introduce exoharness::first_matching_event helper for the recurring
  'find the latest event of kind K, decoded via predicate P' pattern.
  Three callers refactored: latest_snapshot_for_sandbox (the one in
  the comment), latest_shell_sandbox, and tui::latest_sandbox_id.

- fix integration test: cross_process_send_resumes_the_same_sandbox_container
  was still using the pre-rebase 'chat send' subcommand. Main's PR ankrgyl#4
  renamed it to 'conversation send'.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds a Daytona-cloud-backed implementation of `ManagedSandboxBackend`.
REST client lives in `crates/exoharness/src/daytona.rs` and speaks
Daytona's API directly via `reqwest`.

Lifecycle model:
- `acquire` -> POST /api/sandbox, returns a fresh sandbox.
- `try_resume` -> label-based lookup against Daytona's control plane,
  `start` if stopped. This is what makes cross-process resume work for
  free on Daytona: `stop()` preserves filesystem state server-side, so
  the next exo invocation finds the labelled sandbox and resumes it.
- `acquire_from_snapshot` -> POST /api/sandbox with `snapshot: <name>`.
- `snapshot` -> save-as-snapshot. The payload bytes are a small JSON
  manifest pointing at the named Daytona snapshot, not the filesystem
  itself; bytes-by-reference, not bytes-by-value.

Adds `SnapshotKind::DaytonaSnapshot` and an explicit kind-mismatch error
in the CLI backend so a Daytona-produced snapshot under
`--sandbox-backend docker` fails clearly.

Config via env: `DAYTONA_API_KEY` (required), `DAYTONA_API_URL`,
`DAYTONA_TOOLBOX_URL`, `DAYTONA_TARGET`, `DAYTONA_ORGANIZATION_ID`.

Exec, fs, and per-sandbox operations go through Daytona's toolbox
proxy (`proxy.app.daytona.io/toolbox/<id>/...`) rather than the control
plane.

Mounts: rejected up front. Daytona has no host filesystem to bind into;
a follow-up workspace provisioner (git clone or Daytona Volume) will
own the "where does the workspace come from" question.

Streaming `start_process`: synchronous-exec bridge for now (REST exec
isn't streaming-shaped). Buffers stdout/stderr into AsyncRead-friendly
cursors so the rest of the harness is unchanged.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Wires the new Daytona backend through the executor re-exports and the
exo CLI. Adds the `daytona` value to `--sandbox-backend` (env var:
`EXO_SANDBOX_BACKEND=daytona`). Config is read from environment via
`DaytonaConfig::from_env()` at backend construction time.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds crates/cli/tests/daytona_backend.rs — 11 tests covering each
public method on DaytonaSandboxBackend by standing up an in-process
wiremock server that pretends to be Daytona's REST API.

Strategy: don't hit real Daytona in CI. It would require credentials,
network egress, real sandbox provisioning ($), and cleanup that's
fragile if a test panics mid-flight. Mocks let us assert on the things
that actually matter for this layer:

  - which endpoint is hit (control plane vs the separate toolbox host)
  - what the request body/query shape looks like (translating
    SandboxRequest etc. into Daytona's wire format)
  - how the backend interprets canned responses (state-machine
    transitions, payload encoding)

Drift between mock and reality is a real risk but bounded by the PR
description's documentation of the actual API shapes. Catching drift
is the job of the live test plan in the PR; catching code defects in
the translation layer is the job of these mock tests.

Coverage:

  acquire_posts_to_sandbox_endpoint_with_labels
    Verifies POST /sandbox carries the two warm-sandbox labels and
    does NOT set `snapshot` on a fresh acquire (Daytona would
    interpret that as a pre-registered named snapshot, which it isn't).

  acquire_rejects_host_mounts
    Daytona has no host filesystem to bind-mount; the backend must
    reject mounts up front, before hitting the API.

  try_resume_finds_running_sandbox_without_starting
  try_resume_starts_stopped_sandbox
  try_resume_returns_none_when_no_match
  try_resume_filters_by_label_as_single_json_query_param
    The four cases of the cross-process resume lookup. The label-query-
    shape test in particular catches the easy-to-get-wrong case of
    sending `?labels=k=v&labels=k=v` instead of one JSON-encoded
    `?labels={...}`.

  stop_calls_stop_endpoint_not_delete
    Asserts the resume contract: stop must NOT DELETE, since the next
    process needs to be able to resume by label.

  snapshot_returns_daytona_snapshot_payload_with_manifest
    Verifies the payload kind is DaytonaSnapshot (not DockerImageTar)
    and that the manifest carries both snapshot_name and base_image.

  acquire_from_snapshot_passes_snapshot_name_in_create_body
    Verifies the inverse: when restoring, the manifest's snapshot_name
    threads into Daytona's create-body `snapshot` field.

  acquire_from_snapshot_rejects_wrong_kind
    A DockerImageTar payload handed to the Daytona backend must error
    cleanly — and must NOT hit the API in the meantime.

  exec_uses_toolbox_url_not_api_url
    The Daytona quirk that per-sandbox operations live on a different
    host (`proxy.app.daytona.io/toolbox/...`) than the control plane
    (`app.daytona.io/api`). Routing the exec call to the wrong host
    would 404 in production but `404` against the same wiremock host
    in tests, so we additionally assert the path the request hit.

Not #[ignore]'d, no docker required — these run in regular per-PR CI
via `cargo test --workspace`. Workspace test count goes from 51 to 62.
- drop unused toolbox_endpoint method on DaytonaSandboxBackend
- collapse redundant |ttl| idle_ttl_to_minutes(ttl) closure
- rustfmt nits from main's PR ankrgyl#26

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…f command execution protocol)

Daytona uses plain REST json api, but for command exec in an E2B sandbox it uses Connect Protocol over http (https://connectrpc.com/) which is a communication protocol to request command execution to the environment daemon of the remote sandbox. To make this happen, we encode the request and send it through this protocol.
@akrentsel akrentsel force-pushed the daytona-backend-v2 branch from 9d75b32 to 02292c3 Compare June 3, 2026 03:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants