Skip to content

Sandbox backend registry with lazy-secret Daytona provider#45

Open
akrentsel wants to merge 1 commit into
mainfrom
sandbox-backend-registry
Open

Sandbox backend registry with lazy-secret Daytona provider#45
akrentsel wants to merge 1 commit into
mainfrom
sandbox-backend-registry

Conversation

@akrentsel
Copy link
Copy Markdown
Collaborator

Summary

Replaces the single-slot sandbox-backend model with a provider registry:

  • The harness starts with a set of supported providers plus one default. The CLI default registers the OS-local container backend (Apple on macOS, Docker on Linux), LocalProcess, and Daytona.
  • Per-agent/per-conversation selection picks a provider; unset → default; specified-but-unsupported → error at use time.
  • Two-tier secret model — prespecify the reference, deref lazily. Remote providers record only a secret-name reference at startup (touching zero keys). The credential is read/decrypted from the secret store on first use, then the backend is cached. Missing secret → clear error at use time. The harness reads zero environment variables.
  • Daytona is fully configurable via exo secret set (DAYTONA_API_KEY, DAYTONA_ORGANIZATION_ID, DAYTONA_TARGET) with no code/flag changes.

Adds the Daytona REST backend under crates/exoharness/src/sandbox_provider/, with a typed sandbox-state enum (reuse a sandbox only when runnable; start only durably-stopped states; replace terminal/error ones). Backends build without holding the cache lock, and duplicate providers are rejected at startup.

This supersedes #22, reworked to fit the registry paradigm and addressing its review feedback (secrets-not-env, move out of the crate root, typed states, doc links).

Test plan

  • cargo build --workspace --all-features clean; clippy clean.
  • cargo test --workspace — all unit/contract tests pass, including the lazy-secret resolution tests.
  • New hermetic cargo test -p exo --test daytona_backend (wiremock) — 12 tests covering find-or-create routing, label query shape, stop-not-delete, snapshot manifest, toolbox-vs-control-plane host routing, and transient/dead-state handling.
  • Live end-to-end against real Daytona: local sandbox runs a command; Daytona request with no secret errors cleanly; after exo secret set, the same conversation lazily resolves the secret and provisions a real remote sandbox.

🤖 Generated with Claude Code

Replace the single-slot sandbox backend with a provider registry: the
harness validates a supported set + a default at startup, and per-agent
selection errors if a requested provider isn't supported. Remote
providers record only a secret-name reference at startup and resolve the
credential from the secret store lazily on first use (Daytona reads
DAYTONA_API_KEY / DAYTONA_ORGANIZATION_ID / DAYTONA_TARGET), so the
harness reads zero environment variables. Backends build without holding
the cache lock, and duplicate providers are rejected at startup.

Adds the Daytona REST backend under sandbox_provider/ with a typed
sandbox-state enum (reuse a sandbox only when runnable; start only
durably-stopped states), plus hermetic wiremock tests for it.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@akrentsel akrentsel requested a review from ankrgyl June 4, 2026 02:27
@@ -0,0 +1,7 @@
//! Per-provider [`crate::sandbox::ManagedSandboxBackend`] implementations,
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to try adding support for my exe.dev VMs which I love so dearly, just to see how well the pattern fits. they're much more durable than all of the ephemeral sandbox providers, but it'd be great to be able to support that too.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the way

toolbox_url: crate::DEFAULT_DAYTONA_TOOLBOX_URL.to_string(),
api_key_secret: "DAYTONA_API_KEY".to_string(),
organization_id_secret: Some("DAYTONA_ORGANIZATION_ID".to_string()),
target_secret: Some("DAYTONA_TARGET".to_string()),
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: I realize these technically aren't "secrets", but it felt clean to have these come from the store so they are configured the same way. we can alternatiely take them in some other form of configuration.

btw, we may want to have a config file eventually for exo, rather than command line flags for everything.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's what bindings are for. we should add sandbox provider bindings if we need to

@akrentsel
Copy link
Copy Markdown
Collaborator Author

btw – once PR #21 is reviewed (and approved), I'll separately work on reconciling the container lifecycles to the the same across the different backends.

there's a few common states to consider:
(1) running
(2) paused – costs some small amount of storage $ on Daytona, consumes RAM in docker
(3) deleted – no cost on either, but loss of data (unless snapshotted)

right now, the pattern is a little different for auto-stop interval. but I want to add snapshotting to daytona in a followup PR, and there will also reconcile behavior to match.

once that is done, I'll add the adapter to make teleportation work!! (requires small adaptor b/c daytona doesn't natively produce docker tarballs from its snapshots

Copy link
Copy Markdown
Owner

@ankrgyl ankrgyl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did a first pass

// `snapshot` is a pre-registered snapshot name, not an image ref; only
// the restore path sets it, so fresh creates use Daytona's default image.
let body = DaytonaCreateRequest {
snapshot: snapshot_name.map(str::to_string),
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think if the snapshot is empty, we need to fall back to the request.spec.image right? Otherwise when is the image ever used?

// (starting a running/mid-transition sandbox would race).
Some(existing) if existing.is_reusable() => {
if existing.needs_start() {
self.start_sandbox(&existing.id).await?;
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to wait until it finishes starting?

Comment on lines +294 to +297
// Daytona's exec endpoint is request/response, not streaming: run the
// command, then hand back already-populated stdout/stderr/wait. stdin
// goes to a sink — this endpoint doesn't accept piped input.
let output =
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we throw a runtime error here instead? i'm neutral

}

async fn create_sandbox(&self, request: CreateSandboxRequest) -> Result<SandboxId> {
let _guard = self.harness.inner.write_lock.lock().await;
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not new code but maybe we shouldn't lock the whole exoharness on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants