Daytona remote-container sandbox backend#22
Conversation
a3d2914 to
19ec3ab
Compare
056471f to
4a85b52
Compare
| @@ -0,0 +1,610 @@ | |||
| //! Daytona remote-container sandbox backend. | |||
There was a problem hiding this comment.
we should probably move this into exoharness/src/sandbox_provider (or something like that). I expect to have like 20 of these and we don't want them to pollute the top-level directory
| pub fn from_env() -> Result<Self> { | ||
| let api_key = std::env::var("DAYTONA_API_KEY") | ||
| .map_err(|_| anyhow!("DAYTONA_API_KEY is not set; required for the Daytona sandbox backend"))?; | ||
| let api_url = | ||
| std::env::var("DAYTONA_API_URL").unwrap_or_else(|_| DEFAULT_DAYTONA_API_URL.to_string()); | ||
| let toolbox_url = std::env::var("DAYTONA_TOOLBOX_URL") | ||
| .unwrap_or_else(|_| DEFAULT_DAYTONA_TOOLBOX_URL.to_string()); | ||
| let target = std::env::var("DAYTONA_TARGET").ok(); | ||
| let organization_id = std::env::var("DAYTONA_ORGANIZATION_ID").ok(); |
There was a problem hiding this comment.
a few principles on arg parsing:
- arguments should always come through clap so they can be provided on the CLI or env
- the exoharness should read 0 environment variables. it eventually becomes a security issue (we want it to be very explicit and trivial to reason about what secrets / config it uses)
| let api_key = std::env::var("DAYTONA_API_KEY") | ||
| .map_err(|_| anyhow!("DAYTONA_API_KEY is not set; required for the Daytona sandbox backend"))?; |
There was a problem hiding this comment.
this should be registered as a secret
| let value = HeaderValue::from_str(org_id).context( | ||
| "DAYTONA_ORGANIZATION_ID contains characters that aren't valid in an HTTP header", | ||
| )?; | ||
| headers.insert("X-Daytona-Organization-ID", value); |
There was a problem hiding this comment.
good practice to link to docs from daytona here
| let client = reqwest::Client::builder() | ||
| .default_headers(headers) | ||
| .build() | ||
| .context("building Daytona HTTP client")?; |
There was a problem hiding this comment.
not necessary yet but we should eventually factor this in somewhere common because there are a lot of tuning knobs / proxying parameters we'll want to standardize across all http calls (like # of connections in the pool)
| fn api_endpoint(&self, path: &str) -> String { | ||
| format!("{}{}", self.api_url, path) | ||
| } |
There was a problem hiding this comment.
i would create a url utils module and add some of this stuff there. also i see you normalize the url above to remove the slash. if i'm reading correctly does that require path starts with a /? I think it's a general best practice to strip leading and trailing slashes and then re-insert them when combining path pieces. Every project i work on has a custom urljoin function lol
| let response = self | ||
| .client | ||
| .get(self.api_endpoint("/sandbox")) | ||
| .query(&[("labels", labels_filter.to_string())]) | ||
| .send() | ||
| .await | ||
| .context("listing Daytona sandboxes")? | ||
| .error_for_status() | ||
| .context("Daytona list sandboxes returned an error status")?; | ||
| let list: DaytonaSandboxList = response | ||
| .json() | ||
| .await | ||
| .context("decoding Daytona sandbox list response")?; |
There was a problem hiding this comment.
we should either use or create a helper get_json type method that we can use across a bunch of APIs
| .join(" ") | ||
| } | ||
|
|
||
| fn shell_quote(arg: &str) -> String { |
There was a problem hiding this comment.
this should prob also be in a generic utils module
| #[derive(Debug, Deserialize)] | ||
| struct DaytonaSandbox { | ||
| id: String, | ||
| state: String, |
There was a problem hiding this comment.
would be better to have an enum of known states here if it's documented by daytona.
| } | ||
|
|
||
| #[derive(Debug, Serialize)] | ||
| struct DaytonaCreateRequest { |
There was a problem hiding this comment.
i want to be careful about supply chain stuff and going crazy with dependencies, but people have created daytona SDKs (https://github.com/krzysztofwos/daytona-client) and they also publish an openapi spec (https://www.daytona.io/docs/en/tools/api/#daytona/tag/config) which we could auto generate types from
74a3ca6 to
82f48d7
Compare
deabcc9 to
9d75b32
Compare
| /// Image the snapshotted sandbox was originally created from. Used as a | ||
| /// fallback if the snapshot itself doesn't carry enough info to recreate. | ||
| base_image: String, | ||
| } |
There was a problem hiding this comment.
Random (and dumb) question: If Daytona already persists the filesystem in the snapshot, why do we need the base_image? Wouldn't the snapshot_name or id be enough to get the snapshot back on a new sandbox?
db1aa77 to
0dc5d90
Compare
Adds cross-process sandbox resume to exoharness. When a conversation
acquires a sandbox, the harness first tries to reattach to a labelled
container left running or stopped by a previous exo process; if that
misses, it restores from the latest `SandboxSnapshotted` event for
that sandbox; otherwise it creates fresh.
## What changed
- exoharness::sandbox: `ManagedSandboxBackend::try_resume` looks up
containers by the `exo.sandbox.key` label, validates spec-hash,
enforces a cross-process idle TTL, and reaps stale or expired
matches. `Drop` stops warm containers (`docker stop -t 0`)
instead of deleting them so the next process can find them.
- executor::harness_tool: `ensure_shell_sandbox` runs the 3-tier
fallback (try_resume → snapshot restore → create fresh).
- Adds `exoharness::first_matching_event` helper for the recurring
"latest event of kind K matching predicate P" pattern; three
call sites converted.
- Switches docker queries to `--format '{{json .}}'` + serde-decoded
`DockerPsItem` (Apple still uses its own JSON shape).
- New integration test `cross_process_send_resumes_…` (real exo
binary + wiremock) and per-tier docker tests in
`crates/cli/tests/lifecycle_resume.rs`.
## User-visible behavior change
Exiting exo no longer `docker rm -f`'s warm containers; they survive
as `Exited` for cross-process resume and get cleaned up via the idle
TTL or the next provisioning attempt that no longer matches them.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
0dc5d90 to
0f3977c
Compare
Adds a Daytona-cloud-backed implementation of `ManagedSandboxBackend`. REST client lives in `crates/exoharness/src/daytona.rs` and speaks Daytona's API directly via `reqwest`. Lifecycle model: - `acquire` -> POST /api/sandbox, returns a fresh sandbox. - `try_resume` -> label-based lookup against Daytona's control plane, `start` if stopped. This is what makes cross-process resume work for free on Daytona: `stop()` preserves filesystem state server-side, so the next exo invocation finds the labelled sandbox and resumes it. - `acquire_from_snapshot` -> POST /api/sandbox with `snapshot: <name>`. - `snapshot` -> save-as-snapshot. The payload bytes are a small JSON manifest pointing at the named Daytona snapshot, not the filesystem itself; bytes-by-reference, not bytes-by-value. Adds `SnapshotKind::DaytonaSnapshot` and an explicit kind-mismatch error in the CLI backend so a Daytona-produced snapshot under `--sandbox-backend docker` fails clearly. Config via env: `DAYTONA_API_KEY` (required), `DAYTONA_API_URL`, `DAYTONA_TOOLBOX_URL`, `DAYTONA_TARGET`, `DAYTONA_ORGANIZATION_ID`. Exec, fs, and per-sandbox operations go through Daytona's toolbox proxy (`proxy.app.daytona.io/toolbox/<id>/...`) rather than the control plane. Mounts: rejected up front. Daytona has no host filesystem to bind into; a follow-up workspace provisioner (git clone or Daytona Volume) will own the "where does the workspace come from" question. Streaming `start_process`: synchronous-exec bridge for now (REST exec isn't streaming-shaped). Buffers stdout/stderr into AsyncRead-friendly cursors so the rest of the harness is unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Wires the new Daytona backend through the executor re-exports and the exo CLI. Adds the `daytona` value to `--sandbox-backend` (env var: `EXO_SANDBOX_BACKEND=daytona`). Config is read from environment via `DaytonaConfig::from_env()` at backend construction time. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds crates/cli/tests/daytona_backend.rs — 11 tests covering each
public method on DaytonaSandboxBackend by standing up an in-process
wiremock server that pretends to be Daytona's REST API.
Strategy: don't hit real Daytona in CI. It would require credentials,
network egress, real sandbox provisioning ($), and cleanup that's
fragile if a test panics mid-flight. Mocks let us assert on the things
that actually matter for this layer:
- which endpoint is hit (control plane vs the separate toolbox host)
- what the request body/query shape looks like (translating
SandboxRequest etc. into Daytona's wire format)
- how the backend interprets canned responses (state-machine
transitions, payload encoding)
Drift between mock and reality is a real risk but bounded by the PR
description's documentation of the actual API shapes. Catching drift
is the job of the live test plan in the PR; catching code defects in
the translation layer is the job of these mock tests.
Coverage:
acquire_posts_to_sandbox_endpoint_with_labels
Verifies POST /sandbox carries the two warm-sandbox labels and
does NOT set `snapshot` on a fresh acquire (Daytona would
interpret that as a pre-registered named snapshot, which it isn't).
acquire_rejects_host_mounts
Daytona has no host filesystem to bind-mount; the backend must
reject mounts up front, before hitting the API.
try_resume_finds_running_sandbox_without_starting
try_resume_starts_stopped_sandbox
try_resume_returns_none_when_no_match
try_resume_filters_by_label_as_single_json_query_param
The four cases of the cross-process resume lookup. The label-query-
shape test in particular catches the easy-to-get-wrong case of
sending `?labels=k=v&labels=k=v` instead of one JSON-encoded
`?labels={...}`.
stop_calls_stop_endpoint_not_delete
Asserts the resume contract: stop must NOT DELETE, since the next
process needs to be able to resume by label.
snapshot_returns_daytona_snapshot_payload_with_manifest
Verifies the payload kind is DaytonaSnapshot (not DockerImageTar)
and that the manifest carries both snapshot_name and base_image.
acquire_from_snapshot_passes_snapshot_name_in_create_body
Verifies the inverse: when restoring, the manifest's snapshot_name
threads into Daytona's create-body `snapshot` field.
acquire_from_snapshot_rejects_wrong_kind
A DockerImageTar payload handed to the Daytona backend must error
cleanly — and must NOT hit the API in the meantime.
exec_uses_toolbox_url_not_api_url
The Daytona quirk that per-sandbox operations live on a different
host (`proxy.app.daytona.io/toolbox/...`) than the control plane
(`app.daytona.io/api`). Routing the exec call to the wrong host
would 404 in production but `404` against the same wiremock host
in tests, so we additionally assert the path the request hit.
Not #[ignore]'d, no docker required — these run in regular per-PR CI
via `cargo test --workspace`. Workspace test count goes from 51 to 62.
- drop unused toolbox_endpoint method on DaytonaSandboxBackend - collapse redundant |ttl| idle_ttl_to_minutes(ttl) closure - rustfmt nits from main's PR #26 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
9d75b32 to
02292c3
Compare
0f3977c to
0c44587
Compare
|
Thanks for the comments @ankrgyl. I just went ahead and re-wrote this afresh (taking your comments into account, and your newly-merged #25). It was different enough that I figured I'd have a new PR for it, and I'll close this one. #45 is the new one. I'm going to follow thi sup with some changes to adapt Juan's code to fit this pattern too. |

A new sandbox backend that runs containers on Daytona instead of locally. Selected via
--sandbox-backend daytona(env:EXO_SANDBOX_BACKEND=daytona). All the harness machinery — tool dispatch, snapshot/rewind, the cross-process resume contract from #21 — works against it transparently.Inheriting the resume contract
The lifecycle work in #21 was designed so Daytona slots in cleanly:
ManagedSandboxBackend::acquire→POST /api/sandbox(create fresh).ManagedSandboxBackend::try_resume→ label-based lookup against Daytona's control plane,POST /api/sandbox/{id}/startif stopped. Daytona's server-sidestoppreserves filesystem state, so resume comes essentially free.ManagedSandboxHandle::stop→POST /api/sandbox/{id}/stop(not delete).ManagedSandboxHandle::snapshot→POST /api/sandbox/{id}/snapshot. Returns aSnapshotKind::DaytonaSnapshotpayload whose bytes are a small JSON manifest pointing at the named Daytona snapshot (bytes-by-reference, in contrast to Docker'sDockerImageTarbytes-by-value). No multi-MB tarball ever crosses the wire.End-to-end verified live: two
chat sendinvocations against the same conversation, in separate exo processes, write a marker file in send 1, read it back in send 2 — same Daytona sandbox UUID, file persists.Snapshot status: code is in, endpoint is not
Discovered while live-testing: the snapshot path is upstream-unavailable at the moment, not a code defect.
The code follows Daytona's published OpenAPI spec exactly:
POST /api/sandbox/{sandboxIdOrName}/snapshot, body{"name": "..."}— matchesCreateSandboxSnapshotin theirapi-json._experimental_createSnapshot(note the leading underscore).But the endpoint returns
404 Cannot POSTonapp.daytona.io. Cross-checked three ways, all hit the same wall:/snapshotslash command → same 404 surfaced through to the user.sandbox._experimental_create_snapshot(...)→DaytonaNotFoundError: Failed to create snapshot: Cannot POST /api/sandbox/<id>/snapshot.So the operation is documented + present in SDKs but not actually deployed on the public API host yet. When Daytona enables it our code will Just Work without modification.
Related harness gap noticed while testing:
ConversationHandle::snapshot_sandboxrequires the sandbox to be in this process'srunning_sandboxesHashMap, so calling/snapshotcold (before any tool call has bound a handle in-process) errors withsandbox is not running. The fix is the sametry_resume-on-miss pattern PR #21 added torun_in_sandbox; extending it tosnapshot_sandboxis a small follow-up orthogonal to this PR.What changed
crates/exoharness/src/daytona.rs(new, ~600 LOC)DaytonaSandboxBackend+DaytonaConfig. Implementsacquire,try_resume,acquire_from_snapshot. Synchronous-exec bridge forstart_process.crates/exoharness/src/sandbox.rsSnapshotKind::DaytonaSnapshotvariant + a(_, DaytonaSnapshot)arm in the CLI backend'sacquire_from_snapshotso the wrong-backend case errors clearly. Constants & spec-hash helper exposedpub(crate).crates/exoharness/src/basic.rsSandboxBackendChoice::Daytona(DaytonaConfig)variant +build_sandbox_backendreturnsResult.crates/exoharness/src/lib.rs,crates/executor/src/lib.rscrates/cli/src/main.rs--sandbox-backend daytona+DaytonaConfig::from_env()at CLI parse time.Daytona-specific knobs
The REST API has a few non-obvious shapes; documented inline. Highlights surfaced during live debugging:
labelsis one JSON-encoded query param, not multiple repeated?labels=k=vparams.{items, nextCursor}, not a bare array.envon create is an object, not an array of{name, value}.snapshoton create refers to a pre-registered snapshot name in the user's Daytona dashboard, not an arbitrary docker image ref. We omit it on fresh creates.proxy.app.daytona.io/toolbox/<id>/...— than the control plane (app.daytona.io/api). Both configurable viaDAYTONA_API_URL/DAYTONA_TOOLBOX_URL.X-Daytona-Organization-IDheader (set viaDAYTONA_ORGANIZATION_IDenv var).Out of scope (deliberate)
SandboxSpecare rejected with a clear error. The longer-term answer is a workspace provisioner abstraction at the conversation level — separate effort.start_process. Daytona's exec endpoint is request/response, not streaming. We buffer stdout/stderr into in-memoryCursors.DaytonaSnapshotpayload can't be restored under Docker and vice versa —kind-mismatch errors clearly.Stack
Test plan
cargo build --workspaceclean.cargo test --workspace— 51 unit tests pass; existing integration tests still pass.chat sendcreates a real sandbox, executes via the toolbox proxy, returns output.chat sendinvocations in fresh exo processes hit the same Daytona sandbox UUID; marker file written in send 1 is readable in send 2.DELETE /api/sandbox/{id}?force=trueworks; no orphaned sandboxes after testing./snapshotreaches Daytona and returns a clear error (endpoint not currently deployed upstream — see "Snapshot status" above). Restore path is unreachable until snapshot is available upstream.🤖 Generated with Claude Code