Skip to content

Daytona remote-container sandbox backend#22

Closed
akrentsel wants to merge 5 commits into
cross-process-resumefrom
daytona-backend-v2
Closed

Daytona remote-container sandbox backend#22
akrentsel wants to merge 5 commits into
cross-process-resumefrom
daytona-backend-v2

Conversation

@akrentsel
Copy link
Copy Markdown
Collaborator

@akrentsel akrentsel commented May 25, 2026

A new sandbox backend that runs containers on Daytona instead of locally. Selected via --sandbox-backend daytona (env: EXO_SANDBOX_BACKEND=daytona). All the harness machinery — tool dispatch, snapshot/rewind, the cross-process resume contract from #21 — works against it transparently.

export DAYTONA_API_KEY=...
export DAYTONA_ORGANIZATION_ID=...
exo --sandbox-backend daytona chat repl my-agent my-convo

Inheriting the resume contract

The lifecycle work in #21 was designed so Daytona slots in cleanly:

  • ManagedSandboxBackend::acquirePOST /api/sandbox (create fresh).
  • ManagedSandboxBackend::try_resume → label-based lookup against Daytona's control plane, POST /api/sandbox/{id}/start if stopped. Daytona's server-side stop preserves filesystem state, so resume comes essentially free.
  • ManagedSandboxHandle::stopPOST /api/sandbox/{id}/stop (not delete).
  • ManagedSandboxHandle::snapshotPOST /api/sandbox/{id}/snapshot. Returns a SnapshotKind::DaytonaSnapshot payload whose bytes are a small JSON manifest pointing at the named Daytona snapshot (bytes-by-reference, in contrast to Docker's DockerImageTar bytes-by-value). No multi-MB tarball ever crosses the wire.

End-to-end verified live: two chat send invocations against the same conversation, in separate exo processes, write a marker file in send 1, read it back in send 2 — same Daytona sandbox UUID, file persists.

Snapshot status: code is in, endpoint is not

Discovered while live-testing: the snapshot path is upstream-unavailable at the moment, not a code defect.

The code follows Daytona's published OpenAPI spec exactly:

  • POST /api/sandbox/{sandboxIdOrName}/snapshot, body {"name": "..."} — matches CreateSandboxSnapshot in their api-json.
  • The flag is consistent with the SDK naming: TypeScript SDK calls it _experimental_createSnapshot (note the leading underscore).

But the endpoint returns 404 Cannot POST on app.daytona.io. Cross-checked three ways, all hit the same wall:

  1. Raw curl with the spec-matching body shape → 404.
  2. exo /snapshot slash command → same 404 surfaced through to the user.
  3. Daytona's official Python SDK calling sandbox._experimental_create_snapshot(...)DaytonaNotFoundError: Failed to create snapshot: Cannot POST /api/sandbox/<id>/snapshot.

So the operation is documented + present in SDKs but not actually deployed on the public API host yet. When Daytona enables it our code will Just Work without modification.

Related harness gap noticed while testing: ConversationHandle::snapshot_sandbox requires the sandbox to be in this process's running_sandboxes HashMap, so calling /snapshot cold (before any tool call has bound a handle in-process) errors with sandbox is not running. The fix is the same try_resume-on-miss pattern PR #21 added to run_in_sandbox; extending it to snapshot_sandbox is a small follow-up orthogonal to this PR.

What changed

File Change
crates/exoharness/src/daytona.rs (new, ~600 LOC) REST client + DaytonaSandboxBackend + DaytonaConfig. Implements acquire, try_resume, acquire_from_snapshot. Synchronous-exec bridge for start_process.
crates/exoharness/src/sandbox.rs New SnapshotKind::DaytonaSnapshot variant + a (_, DaytonaSnapshot) arm in the CLI backend's acquire_from_snapshot so the wrong-backend case errors clearly. Constants & spec-hash helper exposed pub(crate).
crates/exoharness/src/basic.rs SandboxBackendChoice::Daytona(DaytonaConfig) variant + build_sandbox_backend returns Result.
crates/exoharness/src/lib.rs, crates/executor/src/lib.rs Module registration + re-exports.
crates/cli/src/main.rs --sandbox-backend daytona + DaytonaConfig::from_env() at CLI parse time.

Daytona-specific knobs

The REST API has a few non-obvious shapes; documented inline. Highlights surfaced during live debugging:

  • labels is one JSON-encoded query param, not multiple repeated ?labels=k=v params.
  • List response is {items, nextCursor}, not a bare array.
  • env on create is an object, not an array of {name, value}.
  • snapshot on create refers to a pre-registered snapshot name in the user's Daytona dashboard, not an arbitrary docker image ref. We omit it on fresh creates.
  • Per-sandbox operations (exec, fs) live on a different hostproxy.app.daytona.io/toolbox/<id>/... — than the control plane (app.daytona.io/api). Both configurable via DAYTONA_API_URL / DAYTONA_TOOLBOX_URL.
  • API keys belong to one organization. Required: X-Daytona-Organization-ID header (set via DAYTONA_ORGANIZATION_ID env var).

Out of scope (deliberate)

  • Mounts. Daytona has no host filesystem to bind-mount. Mounts in SandboxSpec are rejected with a clear error. The longer-term answer is a workspace provisioner abstraction at the conversation level — separate effort.
  • Streaming start_process. Daytona's exec endpoint is request/response, not streaming. We buffer stdout/stderr into in-memory Cursors.
  • Cross-backend snapshot interchange. A DaytonaSnapshot payload can't be restored under Docker and vice versa — kind-mismatch errors clearly.

Stack

main
  └── #20  sandbox-snapshots-filesystem    Docker snapshot + rewind
        └── #21  cross-process-resume      try_resume contract + 3-tier fallback
              └── #22  this PR             Daytona backend (slots into the contract)

Test plan

  • cargo build --workspace clean.
  • cargo test --workspace — 51 unit tests pass; existing integration tests still pass.
  • Live end-to-end against real Daytona: chat send creates a real sandbox, executes via the toolbox proxy, returns output.
  • Cross-process resume verified live: two chat send invocations in fresh exo processes hit the same Daytona sandbox UUID; marker file written in send 1 is readable in send 2.
  • Cleanup via DELETE /api/sandbox/{id}?force=true works; no orphaned sandboxes after testing.
  • /snapshot reaches Daytona and returns a clear error (endpoint not currently deployed upstream — see "Snapshot status" above). Restore path is unreachable until snapshot is available upstream.

🤖 Generated with Claude Code

@akrentsel
Copy link
Copy Markdown
Collaborator Author

Some testing – it's working!
daytona-resume

@akrentsel akrentsel force-pushed the cross-process-resume branch from a3d2914 to 19ec3ab Compare May 25, 2026 07:56
@akrentsel akrentsel force-pushed the daytona-backend-v2 branch from 056471f to 4a85b52 Compare May 25, 2026 08:19
@@ -0,0 +1,610 @@
//! Daytona remote-container sandbox backend.
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably move this into exoharness/src/sandbox_provider (or something like that). I expect to have like 20 of these and we don't want them to pollute the top-level directory

Comment on lines +51 to +59
pub fn from_env() -> Result<Self> {
let api_key = std::env::var("DAYTONA_API_KEY")
.map_err(|_| anyhow!("DAYTONA_API_KEY is not set; required for the Daytona sandbox backend"))?;
let api_url =
std::env::var("DAYTONA_API_URL").unwrap_or_else(|_| DEFAULT_DAYTONA_API_URL.to_string());
let toolbox_url = std::env::var("DAYTONA_TOOLBOX_URL")
.unwrap_or_else(|_| DEFAULT_DAYTONA_TOOLBOX_URL.to_string());
let target = std::env::var("DAYTONA_TARGET").ok();
let organization_id = std::env::var("DAYTONA_ORGANIZATION_ID").ok();
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few principles on arg parsing:

  • arguments should always come through clap so they can be provided on the CLI or env
  • the exoharness should read 0 environment variables. it eventually becomes a security issue (we want it to be very explicit and trivial to reason about what secrets / config it uses)

Comment thread crates/exoharness/src/daytona.rs Outdated
Comment on lines +52 to +53
let api_key = std::env::var("DAYTONA_API_KEY")
.map_err(|_| anyhow!("DAYTONA_API_KEY is not set; required for the Daytona sandbox backend"))?;
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be registered as a secret

let value = HeaderValue::from_str(org_id).context(
"DAYTONA_ORGANIZATION_ID contains characters that aren't valid in an HTTP header",
)?;
headers.insert("X-Daytona-Organization-ID", value);
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good practice to link to docs from daytona here

Comment on lines +103 to +106
let client = reqwest::Client::builder()
.default_headers(headers)
.build()
.context("building Daytona HTTP client")?;
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not necessary yet but we should eventually factor this in somewhere common because there are a lot of tuning knobs / proxying parameters we'll want to standardize across all http calls (like # of connections in the pool)

Comment on lines +115 to +117
fn api_endpoint(&self, path: &str) -> String {
format!("{}{}", self.api_url, path)
}
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would create a url utils module and add some of this stuff there. also i see you normalize the url above to remove the slash. if i'm reading correctly does that require path starts with a /? I think it's a general best practice to strip leading and trailing slashes and then re-insert them when combining path pieces. Every project i work on has a custom urljoin function lol

Comment on lines +136 to +148
let response = self
.client
.get(self.api_endpoint("/sandbox"))
.query(&[("labels", labels_filter.to_string())])
.send()
.await
.context("listing Daytona sandboxes")?
.error_for_status()
.context("Daytona list sandboxes returned an error status")?;
let list: DaytonaSandboxList = response
.json()
.await
.context("decoding Daytona sandbox list response")?;
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should either use or create a helper get_json type method that we can use across a bunch of APIs

.join(" ")
}

fn shell_quote(arg: &str) -> String {
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should prob also be in a generic utils module

#[derive(Debug, Deserialize)]
struct DaytonaSandbox {
id: String,
state: String,
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be better to have an enum of known states here if it's documented by daytona.

}

#[derive(Debug, Serialize)]
struct DaytonaCreateRequest {
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i want to be careful about supply chain stuff and going crazy with dependencies, but people have created daytona SDKs (https://github.com/krzysztofwos/daytona-client) and they also publish an openapi spec (https://www.daytona.io/docs/en/tools/api/#daytona/tag/config) which we could auto generate types from

@akrentsel akrentsel force-pushed the cross-process-resume branch from 74a3ca6 to 82f48d7 Compare June 1, 2026 02:18
@akrentsel akrentsel force-pushed the daytona-backend-v2 branch 4 times, most recently from deabcc9 to 9d75b32 Compare June 1, 2026 04:05
/// Image the snapshotted sandbox was originally created from. Used as a
/// fallback if the snapshot itself doesn't carry enough info to recreate.
base_image: String,
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random (and dumb) question: If Daytona already persists the filesystem in the snapshot, why do we need the base_image? Wouldn't the snapshot_name or id be enough to get the snapshot back on a new sandbox?

This was referenced Jun 2, 2026
@akrentsel akrentsel force-pushed the cross-process-resume branch 4 times, most recently from db1aa77 to 0dc5d90 Compare June 3, 2026 03:17
Adds cross-process sandbox resume to exoharness. When a conversation
acquires a sandbox, the harness first tries to reattach to a labelled
container left running or stopped by a previous exo process; if that
misses, it restores from the latest `SandboxSnapshotted` event for
that sandbox; otherwise it creates fresh.

## What changed

- exoharness::sandbox: `ManagedSandboxBackend::try_resume` looks up
  containers by the `exo.sandbox.key` label, validates spec-hash,
  enforces a cross-process idle TTL, and reaps stale or expired
  matches. `Drop` stops warm containers (`docker stop -t 0`)
  instead of deleting them so the next process can find them.
- executor::harness_tool: `ensure_shell_sandbox` runs the 3-tier
  fallback (try_resume → snapshot restore → create fresh).
- Adds `exoharness::first_matching_event` helper for the recurring
  "latest event of kind K matching predicate P" pattern; three
  call sites converted.
- Switches docker queries to `--format '{{json .}}'` + serde-decoded
  `DockerPsItem` (Apple still uses its own JSON shape).
- New integration test `cross_process_send_resumes_…` (real exo
  binary + wiremock) and per-tier docker tests in
  `crates/cli/tests/lifecycle_resume.rs`.

## User-visible behavior change

Exiting exo no longer `docker rm -f`'s warm containers; they survive
as `Exited` for cross-process resume and get cleaned up via the idle
TTL or the next provisioning attempt that no longer matches them.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@akrentsel akrentsel force-pushed the cross-process-resume branch from 0dc5d90 to 0f3977c Compare June 3, 2026 03:23
akrentsel and others added 4 commits June 3, 2026 03:24
Adds a Daytona-cloud-backed implementation of `ManagedSandboxBackend`.
REST client lives in `crates/exoharness/src/daytona.rs` and speaks
Daytona's API directly via `reqwest`.

Lifecycle model:
- `acquire` -> POST /api/sandbox, returns a fresh sandbox.
- `try_resume` -> label-based lookup against Daytona's control plane,
  `start` if stopped. This is what makes cross-process resume work for
  free on Daytona: `stop()` preserves filesystem state server-side, so
  the next exo invocation finds the labelled sandbox and resumes it.
- `acquire_from_snapshot` -> POST /api/sandbox with `snapshot: <name>`.
- `snapshot` -> save-as-snapshot. The payload bytes are a small JSON
  manifest pointing at the named Daytona snapshot, not the filesystem
  itself; bytes-by-reference, not bytes-by-value.

Adds `SnapshotKind::DaytonaSnapshot` and an explicit kind-mismatch error
in the CLI backend so a Daytona-produced snapshot under
`--sandbox-backend docker` fails clearly.

Config via env: `DAYTONA_API_KEY` (required), `DAYTONA_API_URL`,
`DAYTONA_TOOLBOX_URL`, `DAYTONA_TARGET`, `DAYTONA_ORGANIZATION_ID`.

Exec, fs, and per-sandbox operations go through Daytona's toolbox
proxy (`proxy.app.daytona.io/toolbox/<id>/...`) rather than the control
plane.

Mounts: rejected up front. Daytona has no host filesystem to bind into;
a follow-up workspace provisioner (git clone or Daytona Volume) will
own the "where does the workspace come from" question.

Streaming `start_process`: synchronous-exec bridge for now (REST exec
isn't streaming-shaped). Buffers stdout/stderr into AsyncRead-friendly
cursors so the rest of the harness is unchanged.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Wires the new Daytona backend through the executor re-exports and the
exo CLI. Adds the `daytona` value to `--sandbox-backend` (env var:
`EXO_SANDBOX_BACKEND=daytona`). Config is read from environment via
`DaytonaConfig::from_env()` at backend construction time.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds crates/cli/tests/daytona_backend.rs — 11 tests covering each
public method on DaytonaSandboxBackend by standing up an in-process
wiremock server that pretends to be Daytona's REST API.

Strategy: don't hit real Daytona in CI. It would require credentials,
network egress, real sandbox provisioning ($), and cleanup that's
fragile if a test panics mid-flight. Mocks let us assert on the things
that actually matter for this layer:

  - which endpoint is hit (control plane vs the separate toolbox host)
  - what the request body/query shape looks like (translating
    SandboxRequest etc. into Daytona's wire format)
  - how the backend interprets canned responses (state-machine
    transitions, payload encoding)

Drift between mock and reality is a real risk but bounded by the PR
description's documentation of the actual API shapes. Catching drift
is the job of the live test plan in the PR; catching code defects in
the translation layer is the job of these mock tests.

Coverage:

  acquire_posts_to_sandbox_endpoint_with_labels
    Verifies POST /sandbox carries the two warm-sandbox labels and
    does NOT set `snapshot` on a fresh acquire (Daytona would
    interpret that as a pre-registered named snapshot, which it isn't).

  acquire_rejects_host_mounts
    Daytona has no host filesystem to bind-mount; the backend must
    reject mounts up front, before hitting the API.

  try_resume_finds_running_sandbox_without_starting
  try_resume_starts_stopped_sandbox
  try_resume_returns_none_when_no_match
  try_resume_filters_by_label_as_single_json_query_param
    The four cases of the cross-process resume lookup. The label-query-
    shape test in particular catches the easy-to-get-wrong case of
    sending `?labels=k=v&labels=k=v` instead of one JSON-encoded
    `?labels={...}`.

  stop_calls_stop_endpoint_not_delete
    Asserts the resume contract: stop must NOT DELETE, since the next
    process needs to be able to resume by label.

  snapshot_returns_daytona_snapshot_payload_with_manifest
    Verifies the payload kind is DaytonaSnapshot (not DockerImageTar)
    and that the manifest carries both snapshot_name and base_image.

  acquire_from_snapshot_passes_snapshot_name_in_create_body
    Verifies the inverse: when restoring, the manifest's snapshot_name
    threads into Daytona's create-body `snapshot` field.

  acquire_from_snapshot_rejects_wrong_kind
    A DockerImageTar payload handed to the Daytona backend must error
    cleanly — and must NOT hit the API in the meantime.

  exec_uses_toolbox_url_not_api_url
    The Daytona quirk that per-sandbox operations live on a different
    host (`proxy.app.daytona.io/toolbox/...`) than the control plane
    (`app.daytona.io/api`). Routing the exec call to the wrong host
    would 404 in production but `404` against the same wiremock host
    in tests, so we additionally assert the path the request hit.

Not #[ignore]'d, no docker required — these run in regular per-PR CI
via `cargo test --workspace`. Workspace test count goes from 51 to 62.
- drop unused toolbox_endpoint method on DaytonaSandboxBackend
- collapse redundant |ttl| idle_ttl_to_minutes(ttl) closure
- rustfmt nits from main's PR #26

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@akrentsel
Copy link
Copy Markdown
Collaborator Author

Thanks for the comments @ankrgyl. I just went ahead and re-wrote this afresh (taking your comments into account, and your newly-merged #25). It was different enough that I figured I'd have a new PR for it, and I'll close this one.

#45 is the new one. I'm going to follow thi sup with some changes to adapt Juan's code to fit this pattern too.

@akrentsel akrentsel closed this Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants