Skip to content

container lifecycle: resume sandboxes across exo processes#21

Open
akrentsel wants to merge 1 commit into
mainfrom
cross-process-resume
Open

container lifecycle: resume sandboxes across exo processes#21
akrentsel wants to merge 1 commit into
mainfrom
cross-process-resume

Conversation

@akrentsel
Copy link
Copy Markdown
Collaborator

@akrentsel akrentsel commented May 25, 2026

What this changes

Container lifecycle becomes resume-aware. Before this PR, every exo invocation created a fresh sandbox; after it, an agent continuing a conversation picks up the same sandbox it was using.

Lifecycle on every run_in_sandbox

when the conversation has a recorded sandbox_id:

    (a) container is running
            └─► attach and use it

    (b) container exists but is stopped
            └─► `docker start` it, then use it

    (c) container is gone (rm'd, TTL-expired, server-side auto-cleanup)
            │
            ├─ (i)  conversation log has a snapshot for this sandbox
            │       └─► restore the latest snapshot, use it
            │
            └─ (ii) no snapshot ever taken
                    └─► create a fresh container from base image

otherwise (no recorded sandbox_id):
    └─► create a fresh container from base image

Tiers (a) and (b) are unified inside backend.try_resume. Tier (c)(i) goes through the existing start_sandbox / acquire_from_snapshot path that #20 introduced. Tier (c)(ii) is the current create_sandbox path.

What changed

Layer Change
trait new ManagedSandboxBackend::try_resume(req) -> Option<Handle>: "find the sandbox for this SandboxKey, start it if stopped, attach if running, return None otherwise."
Docker / AppleContainer implements try_resume via docker ps -a --filter label=exo.sandbox.key=… (and container list --format json for Apple), matches by spec-hash, reaps drifted entries, enforces cross-process idle TTL via docker inspect .State.FinishedAt.
LocalProcess try_resume always Ok(None) — no persistent identity to resume.
Drop Drop for CliContainerSandboxBackend now does docker stop -t 0 instead of docker rm -f. Containers survive process exit, labelled and ready for the next exo invocation to find.
run_in_sandbox on in-memory cache miss, calls backend.try_resume and binds the handle. The old error "sandbox is not active in this process" only fires now when the cross-process lookup also misses.
ensure_shell_sandbox runs the 3-tier chain above. Tier 2 (snapshot restore) walks SandboxSnapshotted events for the recorded sandbox_id.
tests one new integration test: two cross-process chat send invocations against one conversation produce exactly one docker container.

User-visible behavior change

  • Before: exit exo → all warm containers docker rm -f'd. docker ps -a was clean after exit.
  • After: exit exo → warm containers stopped (Exited state on host). docker ps -a shows them. They get cleaned up the next time exo runs a sandbox that's been idle past the TTL (5 min default, the existing idle_seconds=300 from ensure_shell_sandbox).

If a user previously relied on "exit exo, zero residue on the host," this is a behavior change. The trade-off is what enables resume; manual docker rm still works for anyone who wants the old behavior on demand.

Stack

main
  └── #20  sandbox-snapshots-filesystem    Docker snapshot + rewind
        └── #21  this PR                   Cross-process resume + 3-tier fallback
              └── (follow-up)              Daytona backend, inherits the resume contract

Test plan

  • cargo test --workspace — 51 unit tests pass
  • EXO_TEST_SANDBOX_BACKEND=docker cargo test -- --ignored — both integration tests pass (existing one with updated container-state assertion, and the new cross-process resume test)
  • Manual end-to-end: chat send 'echo foo > /tmp/x.txt' then in a separate exo invocation chat send 'cat /tmp/x.txt' — the file persists, same container is resumed.

🤖 Generated with Claude Code

@akrentsel akrentsel force-pushed the cross-process-resume branch from a3d2914 to 19ec3ab Compare May 25, 2026 07:56
akrentsel added a commit that referenced this pull request May 25, 2026
Three new integration tests, one per tier of the 3-tier fallback chain
in ensure_shell_sandbox:

  tier_1_stopped_container_is_resumed_same_id
    Drop the harness (PR #21's Drop stops, doesn't rm). Container
    survives on the host in Exited state. Second harness's try_resume
    finds it by label, docker-starts it, attaches. Same container ID,
    same sandbox_id, marker file persists across the stop/start cycle.

  tier_2_gone_container_with_snapshot_restores
    First harness takes a snapshot of the live sandbox (PR #20 API).
    Drop the harness; `docker rm -f` the container (simulates idle-TTL
    expiry / external cleanup). Second harness's try_resume misses,
    falls through to Tier 2, finds the snapshot in the event log, and
    calls start_sandbox -> acquire_from_snapshot. A NEW container id is
    materialised, but the sandbox_id is reused and the marker is
    restored from the snapshot — proving the snapshot path actually
    fires, not just resume.

  tier_3_gone_container_without_snapshot_creates_fresh
    Same setup as tier 2 minus the snapshot. Second harness misses
    Tier 1 (no container) and Tier 2 (no snapshot), so falls through
    to create_sandbox. A new sandbox_id is generated; the conversation
    log now has two SandboxCreated events; the previous marker is gone
    from the fresh container.

Each test simulates the "two exo processes" boundary by dropping the
first BasicExoHarness and constructing a new one from the same root
dir. Library-API driven (no LLM mock, no binary spawn) — the harness's
3-tier behaviour is the only thing under test here.

Wired into integration.yml as a third --test target alongside
integration_chat and snapshot_round_trip. Self-skips on non-docker
matrix cells via preflight().

All three pass locally in ~3s against real Docker; self-skip path
runs in 50ms.
@akrentsel
Copy link
Copy Markdown
Collaborator Author

Added tests, shown passing here: https://github.com/ankrgyl/exo/actions/runs/26390452231/job/77678665189

Comment thread crates/executor/src/harness_tool.rs Outdated
Comment thread crates/exoharness/src/sandbox.rs Outdated
@akrentsel akrentsel force-pushed the sandbox-snapshots-filesystem branch from 483a3bc to 36dfb8a Compare June 1, 2026 02:09
akrentsel added a commit that referenced this pull request Jun 1, 2026
Three new integration tests, one per tier of the 3-tier fallback chain
in ensure_shell_sandbox:

  tier_1_stopped_container_is_resumed_same_id
    Drop the harness (PR #21's Drop stops, doesn't rm). Container
    survives on the host in Exited state. Second harness's try_resume
    finds it by label, docker-starts it, attaches. Same container ID,
    same sandbox_id, marker file persists across the stop/start cycle.

  tier_2_gone_container_with_snapshot_restores
    First harness takes a snapshot of the live sandbox (PR #20 API).
    Drop the harness; `docker rm -f` the container (simulates idle-TTL
    expiry / external cleanup). Second harness's try_resume misses,
    falls through to Tier 2, finds the snapshot in the event log, and
    calls start_sandbox -> acquire_from_snapshot. A NEW container id is
    materialised, but the sandbox_id is reused and the marker is
    restored from the snapshot — proving the snapshot path actually
    fires, not just resume.

  tier_3_gone_container_without_snapshot_creates_fresh
    Same setup as tier 2 minus the snapshot. Second harness misses
    Tier 1 (no container) and Tier 2 (no snapshot), so falls through
    to create_sandbox. A new sandbox_id is generated; the conversation
    log now has two SandboxCreated events; the previous marker is gone
    from the fresh container.

Each test simulates the "two exo processes" boundary by dropping the
first BasicExoHarness and constructing a new one from the same root
dir. Library-API driven (no LLM mock, no binary spawn) — the harness's
3-tier behaviour is the only thing under test here.

Wired into integration.yml as a third --test target alongside
integration_chat and snapshot_round_trip. Self-skips on non-docker
matrix cells via preflight().

All three pass locally in ~3s against real Docker; self-skip path
runs in 50ms.
@akrentsel akrentsel force-pushed the cross-process-resume branch from 74a3ca6 to 82f48d7 Compare June 1, 2026 02:18
Base automatically changed from sandbox-snapshots-filesystem to main June 1, 2026 02:18
akrentsel added a commit that referenced this pull request Jun 1, 2026
- docker ps: switch from tab-template to --format '{{json .}}' + serde
  (DockerPsItem + parse_docker_labels). Same swap in the integration
  test helper. Unit test for the label parser.

- introduce exoharness::first_matching_event helper for the recurring
  'find the latest event of kind K, decoded via predicate P' pattern.
  Three callers refactored: latest_snapshot_for_sandbox (the one in
  the comment), latest_shell_sandbox, and tui::latest_sandbox_id.

- fix integration test: cross_process_send_resumes_the_same_sandbox_container
  was still using the pre-rebase 'chat send' subcommand. Main's PR #4
  renamed it to 'conversation send'.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
akrentsel added a commit that referenced this pull request Jun 3, 2026
- docker ps: switch from tab-template to --format '{{json .}}' + serde
  (DockerPsItem + parse_docker_labels). Same swap in the integration
  test helper. Unit test for the label parser.

- introduce exoharness::first_matching_event helper for the recurring
  'find the latest event of kind K, decoded via predicate P' pattern.
  Three callers refactored: latest_snapshot_for_sandbox (the one in
  the comment), latest_shell_sandbox, and tui::latest_sandbox_id.

- fix integration test: cross_process_send_resumes_the_same_sandbox_container
  was still using the pre-rebase 'chat send' subcommand. Main's PR #4
  renamed it to 'conversation send'.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@akrentsel akrentsel force-pushed the cross-process-resume branch from 89c07cd to 482777a Compare June 3, 2026 02:45
akrentsel added a commit that referenced this pull request Jun 3, 2026
- docker ps: switch from tab-template to --format '{{json .}}' + serde
  (DockerPsItem + parse_docker_labels). Same swap in the integration
  test helper. Unit test for the label parser.

- introduce exoharness::first_matching_event helper for the recurring
  'find the latest event of kind K, decoded via predicate P' pattern.
  Three callers refactored: latest_snapshot_for_sandbox (the one in
  the comment), latest_shell_sandbox, and tui::latest_sandbox_id.

- fix integration test: cross_process_send_resumes_the_same_sandbox_container
  was still using the pre-rebase 'chat send' subcommand. Main's PR #4
  renamed it to 'conversation send'.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@akrentsel akrentsel force-pushed the cross-process-resume branch from 482777a to 9c9f420 Compare June 3, 2026 02:53
akrentsel added a commit that referenced this pull request Jun 3, 2026
- docker ps: switch from tab-template to --format '{{json .}}' + serde
  (DockerPsItem + parse_docker_labels). Same swap in the integration
  test helper. Unit test for the label parser.

- introduce exoharness::first_matching_event helper for the recurring
  'find the latest event of kind K, decoded via predicate P' pattern.
  Three callers refactored: latest_snapshot_for_sandbox (the one in
  the comment), latest_shell_sandbox, and tui::latest_sandbox_id.

- fix integration test: cross_process_send_resumes_the_same_sandbox_container
  was still using the pre-rebase 'chat send' subcommand. Main's PR #4
  renamed it to 'conversation send'.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@akrentsel akrentsel force-pushed the cross-process-resume branch from 9c9f420 to db1aa77 Compare June 3, 2026 03:14
akrentsel added a commit that referenced this pull request Jun 3, 2026
- docker ps: switch from tab-template to --format '{{json .}}' + serde
  (DockerPsItem + parse_docker_labels). Same swap in the integration
  test helper. Unit test for the label parser.

- introduce exoharness::first_matching_event helper for the recurring
  'find the latest event of kind K, decoded via predicate P' pattern.
  Three callers refactored: latest_snapshot_for_sandbox (the one in
  the comment), latest_shell_sandbox, and tui::latest_sandbox_id.

- fix integration test: cross_process_send_resumes_the_same_sandbox_container
  was still using the pre-rebase 'chat send' subcommand. Main's PR #4
  renamed it to 'conversation send'.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@akrentsel akrentsel force-pushed the cross-process-resume branch 2 times, most recently from 0dc5d90 to 0f3977c Compare June 3, 2026 03:23
Adds cross-process sandbox resume to exoharness. When a conversation
acquires a sandbox, the harness first tries to reattach to a labelled
container left running or stopped by a previous exo process; if that
misses, it restores from the latest `SandboxSnapshotted` event for
that sandbox; otherwise it creates fresh.

- exoharness::sandbox: `ManagedSandboxBackend::try_resume` looks up
  containers by the `exo.sandbox.key` label, validates spec-hash,
  enforces a cross-process idle TTL, and reaps stale or expired
  matches. `Drop` stops warm containers (`docker stop -t 0`)
  instead of deleting them so the next process can find them.
- executor::harness_tool: `ensure_shell_sandbox` runs the 3-tier
  fallback (try_resume → snapshot restore → create fresh).
- Adds `exoharness::first_matching_event` helper for the recurring
  "latest event of kind K matching predicate P" pattern; three
  call sites converted.
- Switches docker queries to `--format '{{json .}}'` + serde-decoded
  `DockerPsItem` (Apple still uses its own JSON shape).
- New integration test `cross_process_send_resumes_…` (real exo
  binary + wiremock) and per-tier docker tests in
  `crates/cli/tests/lifecycle_resume.rs`.

Exiting exo no longer `docker rm -f`'s warm containers; they survive
as `Exited` for cross-process resume and get cleaned up via the idle
TTL or the next provisioning attempt that no longer matches them.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@akrentsel akrentsel force-pushed the cross-process-resume branch from 0f3977c to 0c44587 Compare June 3, 2026 03:34
@akrentsel akrentsel marked this pull request as ready for review June 3, 2026 03:52
@akrentsel
Copy link
Copy Markdown
Collaborator Author

Ready for review here: @ankrgyl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants