container lifecycle: resume sandboxes across exo processes#21
Open
akrentsel wants to merge 1 commit into
Open
Conversation
This was referenced May 25, 2026
a3d2914 to
19ec3ab
Compare
akrentsel
added a commit
that referenced
this pull request
May 25, 2026
Three new integration tests, one per tier of the 3-tier fallback chain
in ensure_shell_sandbox:
tier_1_stopped_container_is_resumed_same_id
Drop the harness (PR #21's Drop stops, doesn't rm). Container
survives on the host in Exited state. Second harness's try_resume
finds it by label, docker-starts it, attaches. Same container ID,
same sandbox_id, marker file persists across the stop/start cycle.
tier_2_gone_container_with_snapshot_restores
First harness takes a snapshot of the live sandbox (PR #20 API).
Drop the harness; `docker rm -f` the container (simulates idle-TTL
expiry / external cleanup). Second harness's try_resume misses,
falls through to Tier 2, finds the snapshot in the event log, and
calls start_sandbox -> acquire_from_snapshot. A NEW container id is
materialised, but the sandbox_id is reused and the marker is
restored from the snapshot — proving the snapshot path actually
fires, not just resume.
tier_3_gone_container_without_snapshot_creates_fresh
Same setup as tier 2 minus the snapshot. Second harness misses
Tier 1 (no container) and Tier 2 (no snapshot), so falls through
to create_sandbox. A new sandbox_id is generated; the conversation
log now has two SandboxCreated events; the previous marker is gone
from the fresh container.
Each test simulates the "two exo processes" boundary by dropping the
first BasicExoHarness and constructing a new one from the same root
dir. Library-API driven (no LLM mock, no binary spawn) — the harness's
3-tier behaviour is the only thing under test here.
Wired into integration.yml as a third --test target alongside
integration_chat and snapshot_round_trip. Self-skips on non-docker
matrix cells via preflight().
All three pass locally in ~3s against real Docker; self-skip path
runs in 50ms.
Collaborator
Author
|
Added tests, shown passing here: https://github.com/ankrgyl/exo/actions/runs/26390452231/job/77678665189 |
ankrgyl
reviewed
May 25, 2026
483a3bc to
36dfb8a
Compare
akrentsel
added a commit
that referenced
this pull request
Jun 1, 2026
Three new integration tests, one per tier of the 3-tier fallback chain
in ensure_shell_sandbox:
tier_1_stopped_container_is_resumed_same_id
Drop the harness (PR #21's Drop stops, doesn't rm). Container
survives on the host in Exited state. Second harness's try_resume
finds it by label, docker-starts it, attaches. Same container ID,
same sandbox_id, marker file persists across the stop/start cycle.
tier_2_gone_container_with_snapshot_restores
First harness takes a snapshot of the live sandbox (PR #20 API).
Drop the harness; `docker rm -f` the container (simulates idle-TTL
expiry / external cleanup). Second harness's try_resume misses,
falls through to Tier 2, finds the snapshot in the event log, and
calls start_sandbox -> acquire_from_snapshot. A NEW container id is
materialised, but the sandbox_id is reused and the marker is
restored from the snapshot — proving the snapshot path actually
fires, not just resume.
tier_3_gone_container_without_snapshot_creates_fresh
Same setup as tier 2 minus the snapshot. Second harness misses
Tier 1 (no container) and Tier 2 (no snapshot), so falls through
to create_sandbox. A new sandbox_id is generated; the conversation
log now has two SandboxCreated events; the previous marker is gone
from the fresh container.
Each test simulates the "two exo processes" boundary by dropping the
first BasicExoHarness and constructing a new one from the same root
dir. Library-API driven (no LLM mock, no binary spawn) — the harness's
3-tier behaviour is the only thing under test here.
Wired into integration.yml as a third --test target alongside
integration_chat and snapshot_round_trip. Self-skips on non-docker
matrix cells via preflight().
All three pass locally in ~3s against real Docker; self-skip path
runs in 50ms.
74a3ca6 to
82f48d7
Compare
akrentsel
added a commit
that referenced
this pull request
Jun 1, 2026
- docker ps: switch from tab-template to --format '{{json .}}' + serde
(DockerPsItem + parse_docker_labels). Same swap in the integration
test helper. Unit test for the label parser.
- introduce exoharness::first_matching_event helper for the recurring
'find the latest event of kind K, decoded via predicate P' pattern.
Three callers refactored: latest_snapshot_for_sandbox (the one in
the comment), latest_shell_sandbox, and tui::latest_sandbox_id.
- fix integration test: cross_process_send_resumes_the_same_sandbox_container
was still using the pre-rebase 'chat send' subcommand. Main's PR #4
renamed it to 'conversation send'.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
akrentsel
added a commit
that referenced
this pull request
Jun 3, 2026
- docker ps: switch from tab-template to --format '{{json .}}' + serde
(DockerPsItem + parse_docker_labels). Same swap in the integration
test helper. Unit test for the label parser.
- introduce exoharness::first_matching_event helper for the recurring
'find the latest event of kind K, decoded via predicate P' pattern.
Three callers refactored: latest_snapshot_for_sandbox (the one in
the comment), latest_shell_sandbox, and tui::latest_sandbox_id.
- fix integration test: cross_process_send_resumes_the_same_sandbox_container
was still using the pre-rebase 'chat send' subcommand. Main's PR #4
renamed it to 'conversation send'.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
89c07cd to
482777a
Compare
akrentsel
added a commit
that referenced
this pull request
Jun 3, 2026
- docker ps: switch from tab-template to --format '{{json .}}' + serde
(DockerPsItem + parse_docker_labels). Same swap in the integration
test helper. Unit test for the label parser.
- introduce exoharness::first_matching_event helper for the recurring
'find the latest event of kind K, decoded via predicate P' pattern.
Three callers refactored: latest_snapshot_for_sandbox (the one in
the comment), latest_shell_sandbox, and tui::latest_sandbox_id.
- fix integration test: cross_process_send_resumes_the_same_sandbox_container
was still using the pre-rebase 'chat send' subcommand. Main's PR #4
renamed it to 'conversation send'.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
482777a to
9c9f420
Compare
akrentsel
added a commit
that referenced
this pull request
Jun 3, 2026
- docker ps: switch from tab-template to --format '{{json .}}' + serde
(DockerPsItem + parse_docker_labels). Same swap in the integration
test helper. Unit test for the label parser.
- introduce exoharness::first_matching_event helper for the recurring
'find the latest event of kind K, decoded via predicate P' pattern.
Three callers refactored: latest_snapshot_for_sandbox (the one in
the comment), latest_shell_sandbox, and tui::latest_sandbox_id.
- fix integration test: cross_process_send_resumes_the_same_sandbox_container
was still using the pre-rebase 'chat send' subcommand. Main's PR #4
renamed it to 'conversation send'.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
9c9f420 to
db1aa77
Compare
akrentsel
added a commit
that referenced
this pull request
Jun 3, 2026
- docker ps: switch from tab-template to --format '{{json .}}' + serde
(DockerPsItem + parse_docker_labels). Same swap in the integration
test helper. Unit test for the label parser.
- introduce exoharness::first_matching_event helper for the recurring
'find the latest event of kind K, decoded via predicate P' pattern.
Three callers refactored: latest_snapshot_for_sandbox (the one in
the comment), latest_shell_sandbox, and tui::latest_sandbox_id.
- fix integration test: cross_process_send_resumes_the_same_sandbox_container
was still using the pre-rebase 'chat send' subcommand. Main's PR #4
renamed it to 'conversation send'.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
0dc5d90 to
0f3977c
Compare
Adds cross-process sandbox resume to exoharness. When a conversation
acquires a sandbox, the harness first tries to reattach to a labelled
container left running or stopped by a previous exo process; if that
misses, it restores from the latest `SandboxSnapshotted` event for
that sandbox; otherwise it creates fresh.
- exoharness::sandbox: `ManagedSandboxBackend::try_resume` looks up
containers by the `exo.sandbox.key` label, validates spec-hash,
enforces a cross-process idle TTL, and reaps stale or expired
matches. `Drop` stops warm containers (`docker stop -t 0`)
instead of deleting them so the next process can find them.
- executor::harness_tool: `ensure_shell_sandbox` runs the 3-tier
fallback (try_resume → snapshot restore → create fresh).
- Adds `exoharness::first_matching_event` helper for the recurring
"latest event of kind K matching predicate P" pattern; three
call sites converted.
- Switches docker queries to `--format '{{json .}}'` + serde-decoded
`DockerPsItem` (Apple still uses its own JSON shape).
- New integration test `cross_process_send_resumes_…` (real exo
binary + wiremock) and per-tier docker tests in
`crates/cli/tests/lifecycle_resume.rs`.
Exiting exo no longer `docker rm -f`'s warm containers; they survive
as `Exited` for cross-process resume and get cleaned up via the idle
TTL or the next provisioning attempt that no longer matches them.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
0f3977c to
0c44587
Compare
Collaborator
Author
|
Ready for review here: @ankrgyl |
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this changes
Container lifecycle becomes resume-aware. Before this PR, every
exoinvocation created a fresh sandbox; after it, an agent continuing a conversation picks up the same sandbox it was using.Lifecycle on every
run_in_sandboxTiers (a) and (b) are unified inside
backend.try_resume. Tier (c)(i) goes through the existingstart_sandbox/acquire_from_snapshotpath that #20 introduced. Tier (c)(ii) is the currentcreate_sandboxpath.What changed
ManagedSandboxBackend::try_resume(req) -> Option<Handle>: "find the sandbox for thisSandboxKey, start it if stopped, attach if running, returnNoneotherwise."try_resumeviadocker ps -a --filter label=exo.sandbox.key=…(andcontainer list --format jsonfor Apple), matches by spec-hash, reaps drifted entries, enforces cross-process idle TTL viadocker inspect .State.FinishedAt.try_resumealwaysOk(None)— no persistent identity to resume.DropDropforCliContainerSandboxBackendnow doesdocker stop -t 0instead ofdocker rm -f. Containers survive process exit, labelled and ready for the next exo invocation to find.run_in_sandboxbackend.try_resumeand binds the handle. The old error"sandbox is not active in this process"only fires now when the cross-process lookup also misses.ensure_shell_sandboxSandboxSnapshottedevents for the recorded sandbox_id.chat sendinvocations against one conversation produce exactly one docker container.User-visible behavior change
docker rm -f'd.docker ps -awas clean after exit.docker ps -ashows them. They get cleaned up the next time exo runs a sandbox that's been idle past the TTL (5 min default, the existingidle_seconds=300fromensure_shell_sandbox).If a user previously relied on "exit exo, zero residue on the host," this is a behavior change. The trade-off is what enables resume; manual
docker rmstill works for anyone who wants the old behavior on demand.Stack
Test plan
cargo test --workspace— 51 unit tests passEXO_TEST_SANDBOX_BACKEND=docker cargo test -- --ignored— both integration tests pass (existing one with updated container-state assertion, and the new cross-process resume test)chat send 'echo foo > /tmp/x.txt'then in a separate exo invocationchat send 'cat /tmp/x.txt'— the file persists, same container is resumed.🤖 Generated with Claude Code