Skip to content

feat(whp): support no-surrogate mode via HYPERLIGHT_MAX_SURROGATES=0#1578

Merged
danbugs merged 1 commit into
hyperlight-dev:mainfrom
danbugs:feat/whp-no-surrogate
Jun 26, 2026
Merged

feat(whp): support no-surrogate mode via HYPERLIGHT_MAX_SURROGATES=0#1578
danbugs merged 1 commit into
hyperlight-dev:mainfrom
danbugs:feat/whp-no-surrogate

Conversation

@danbugs

@danbugs danbugs commented Jun 24, 2026

Copy link
Copy Markdown
Contributor
  • When HYPERLIGHT_MAX_SURROGATES=0, skip surrogate process creation entirely and use VirtualAlloc + WHvMapGpaRange instead of CreateFileMappingA + surrogate + WHvMapGpaRange2
  • This is a single-VM-per-process mode (WHvMapGpaRange returns ERROR_VID_PARTITION_ALREADY_EXISTS when called from multiple partitions in the same process)
  • compute_surrogate_counts() now accepts 0 as a valid minimum, and surrogates_disabled() checks the env var at runtime
  • WhpVm::surrogate_process is now Option<SurrogateProcess>, with map_memory/unmap_memory branching at runtime
  • ExclusiveSharedMemory::new() uses VirtualAlloc (via new DirectAllocation RAII type) when surrogates are disabled, CreateFileMappingA otherwise

Copilot AI review requested due to automatic review settings June 24, 2026 20:13
@danbugs danbugs added kind/enhancement For PRs adding features, improving functionality, docs, tests, etc. ready-for-review PR is ready for (re-)review labels Jun 24, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an optional Windows-only WHP mode (whp-no-surrogate) that bypasses the surrogate process for GPA mapping, intended for single-partition-per-process scenarios, and refactors shared-memory allocation to share validation/guard-page setup across allocation paths.

Changes:

  • Introduces a new whp-no-surrogate feature flag in hyperlight-host.
  • Adds a VirtualAlloc-backed shared memory allocation path (vs CreateFileMappingA) and maps GPAs via WHvMapGpaRange (vs dynamically-loaded WHvMapGpaRange2 through the surrogate).
  • Refactors Windows shared memory creation to reuse validated_total_size() and set_guard_pages() helpers.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
src/hyperlight_host/src/mem/shared_mem.rs Adds DirectAlloc mapping mode + VirtualAlloc allocation path and refactors guard-page/size validation helpers.
src/hyperlight_host/src/hypervisor/virtual_machine/whp.rs Adds feature-gated mapping path using WHvMapGpaRange and removes surrogate-process usage when enabled.
src/hyperlight_host/Cargo.toml Declares the new whp-no-surrogate feature flag.

Comment thread src/hyperlight_host/src/mem/shared_mem.rs Outdated
Comment thread src/hyperlight_host/src/mem/shared_mem.rs Outdated
Comment thread src/hyperlight_host/src/mem/shared_mem.rs Outdated
Comment thread src/hyperlight_host/src/hypervisor/virtual_machine/whp.rs Outdated
Comment thread src/hyperlight_host/src/mem/shared_mem.rs Outdated

@ludfjig ludfjig left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered making this a runtime option instead, for example on SandboxConfiguration? If possible I think I would prefer it

@danbugs danbugs force-pushed the feat/whp-no-surrogate branch from 35aaabf to 820c953 Compare June 24, 2026 22:24
@danbugs danbugs changed the title feat(whp): add whp-no-surrogate mode feat(whp): support no-surrogate mode via HYPERLIGHT_MAX_SURROGATES=0 Jun 24, 2026
@danbugs

danbugs commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

Have you considered making this a runtime option instead, for example on SandboxConfiguration? If possible I think I would prefer it

Had a chat w/ @simongdavies and modified this PR to integrate w/ HYPERLIGHT_MAX_SURROGATES. Now, if you set that to zero, it just doesn't spawn a surrogate process–so, essentially, a runtime config 👍

simongdavies
simongdavies previously approved these changes Jun 25, 2026

@simongdavies simongdavies left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, prefer this to having a feature.

@ludfjig ludfjig left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I think we definitely need some testing that exercises the no-surrogate-process code? Can we rerun existing tests setting it to 0? And should we error gracefully if multiple sandboxes are created in this mode?

Comment thread src/hyperlight_host/src/hypervisor/surrogate_process_manager.rs
Comment thread src/hyperlight_host/src/mem/shared_mem.rs Outdated
@danbugs

danbugs commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Can we rerun existing tests setting it to 0? And should we error gracefully if multiple sandboxes are created in this mode?

Added testing. Opted to not re-run the full CI to save time and also because some tests just don't make sense (e.g., full tests validate surrogate process machinery).

@danbugs danbugs force-pushed the feat/whp-no-surrogate branch 2 times, most recently from 427a949 to a991a2e Compare June 25, 2026 21:00

@syntactically syntactically left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me

Comment thread src/hyperlight_host/src/mem/shared_mem.rs

@ludfjig ludfjig left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a leak possible. And I don't think any tests exercises the new memory apis still. Can we at least run a subset of existing tests to make sure guest calls/hostcall and snapshotting/persist snapshotting still works as expected?

We should also add this to changelog, up to you if you want to do it in this PR or future PR though

Comment thread src/hyperlight_host/src/hypervisor/virtual_machine/whp.rs Outdated
@danbugs danbugs force-pushed the feat/whp-no-surrogate branch 2 times, most recently from beaf570 to 4559074 Compare June 25, 2026 22:39
@danbugs

danbugs commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

And I don't think any tests exercises the new memory apis still. Can we at least run a subset of existing tests to make sure guest calls/hostcall and snapshotting/persist snapshotting still works as expected?

Expanded the CI no-surrogate step to run guest_malloc, guest_panic, corrupt_output_size_prefix_rejected (integration tests covering guest calls + snapshot/restore) and float_roundtrip, callback_test.

We should also add this to changelog, up to you if you want to do it in this PR or future PR though

Will update the changelog in a follow-up before the release.

cc: @ludfjig

@ludfjig

ludfjig commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

And I don't think any tests exercises the new memory apis still. Can we at least run a subset of existing tests to make sure guest calls/hostcall and snapshotting/persist snapshotting still works as expected?

Expanded the CI no-surrogate step to run guest_malloc, guest_panic, corrupt_output_size_prefix_rejected (integration tests covering guest calls + snapshot/restore) and float_roundtrip, callback_test.

We should also add this to changelog, up to you if you want to do it in this PR or future PR though

Will update the changelog in a follow-up before the release.

cc: @ludfjig

Looks good but I don't think snapshotting nor saving/loading snapshot from/to disk is covered by these added tests. And minor nit is that renaming these tests will not fail ci, it will just report running 0 tests, maybe we can grep for them in output to make sure they ran and passed. I fear that we might rename the tests later and forget updating the ci

@danbugs danbugs force-pushed the feat/whp-no-surrogate branch from 4559074 to 4a0c1cd Compare June 25, 2026 23:01
syntactically
syntactically previously approved these changes Jun 25, 2026
Comment thread src/hyperlight_host/src/hypervisor/virtual_machine/whp.rs Outdated
Comment thread .github/workflows/dep_build_test.yml
@danbugs

danbugs commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Looks good but I don't think snapshotting nor saving/loading snapshot from/to disk is covered by these added tests.

Current coverage:

  • Guest calls: guest_malloc (calls TestMalloc in guest)
  • Host calls: callback_test (guest calls back into a registered host function)
  • Snapshot/restore: guest_panic triggers a panic, then the sandbox auto-restores from snapshot to handle the next call. corrupt_output_size_prefix_rejected similarly exercises restore after a poisoned sandbox.
  • Snapshot save/load from disk: restore_from_loaded_snapshot (saves snapshot to disk, loads it, restores from it)
  • In-memory snapshot: snapshot_evolve_restore_handles_state_correctly (explicit snapshot/restore cycle)

And minor nit is that renaming these tests will not fail ci, it will just report running 0 tests, maybe we can grep for them in output to make sure they ran and passed. I fear that we might rename the tests later and forget updating the ci

Added a rename guard for this.

@danbugs danbugs force-pushed the feat/whp-no-surrogate branch from 91fcb14 to 79c8e94 Compare June 25, 2026 23:19
@ludfjig

ludfjig commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Looks good but I don't think snapshotting nor saving/loading snapshot from/to disk is covered by these added tests.

Current coverage:

  • Guest calls: guest_malloc (calls TestMalloc in guest)
  • Host calls: callback_test (guest calls back into a registered host function)
  • Snapshot/restore: guest_panic triggers a panic, then the sandbox auto-restores from snapshot to handle the next call. corrupt_output_size_prefix_rejected similarly exercises restore after a poisoned sandbox.
  • Snapshot save/load from disk: restore_from_loaded_snapshot (saves snapshot to disk, loads it, restores from it)
  • In-memory snapshot: snapshot_evolve_restore_handles_state_correctly (explicit snapshot/restore cycle)

And minor nit is that renaming these tests will not fail ci, it will just report running 0 tests, maybe we can grep for them in output to make sure they ran and passed. I fear that we might rename the tests later and forget updating the ci

Added a rename guard for this.

Tests look perfect thanks!

For the guard, why not have the guard itself set the static boolean when created (or fail if already created), and have the drop unset it, and store on the vm as a field. I think this would be simpler (and no need to std::mem::forget)

When HYPERLIGHT_MAX_SURROGATES=0, bypass surrogate processes entirely:
- Use WHvMapGpaRange (host VA) instead of WHvMapGpaRange2 (surrogate)
- Reuse existing WindowsMapping::Anonymous allocation path
- RAII guard (stored as WhpVm field) enforces single-VM-per-process
- Add dedicated unit test and run existing integration/snapshot tests
  in CI with per-test verification against silent renames
- Add just test-no-surrogate recipe mirroring the CI step

Signed-off-by: danbugs <danilochiarlone@gmail.com>
@danbugs danbugs force-pushed the feat/whp-no-surrogate branch from 79c8e94 to 212f20a Compare June 25, 2026 23:48
@syntactically

syntactically commented Jun 26, 2026

Copy link
Copy Markdown
Member

For the guard, why not have the guard itself set the static boolean when created (or fail if already created), and have the drop unset it, and store on the vm as a field. I think this would be simpler (and no need to std::mem::forget)

Much better idea, thanks.

Actually, with this change, the surrogate guard and the surrogate process have the exact same lifetime and exactly one of them is used. So maybe you could even make the SurrogateProcess an enum or a dyn Trait and push the difference in behaviour all the way down to that level, so whp.rs does not need to know or care whether there is a surrogate process.

To unify the WHvMapGpaRange vs WHvMapGpaRange2 distinction, you could either always use WHvMapGpaRange2 and just use the current process handle when you are not using a surrogate, or you could move the mapping call into the surrogate_process_manager.rs or similar.

@ludfjig ludfjig left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (nit last bullet on pr description is stale)

@danbugs danbugs merged commit ed20727 into hyperlight-dev:main Jun 26, 2026
40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/enhancement For PRs adding features, improving functionality, docs, tests, etc. ready-for-review PR is ready for (re-)review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants