Skip to content

Added capability to load extra blob data in sandbox #605

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 14, 2025

Conversation

danbugs
Copy link
Contributor

@danbugs danbugs commented Jun 12, 2025

This PR contains the following changes:
(1) Made a non-breaking change to the signature of the UninitializedSandbox::new(...) function. Now, instead of a GuestBinary, it takes a GuestEnvironment, which contains the usual binary + optional blob data.
(2) Created new user memory region to store the extra blob data and added some extra config options for it.
(3) Added info of the new memory region to the PEB.
(4) Added tests verifying added blob data.

Reviewing commit-by-commit might be best 👍

@danbugs danbugs added the kind/enhancement For PRs adding features, improving functionality, docs, tests, etc. label Jun 12, 2025
@danbugs danbugs force-pushed the load-extra-blob branch 2 times, most recently from f46ffef to e46c2a8 Compare June 12, 2025 17:26
Copy link
Contributor

@ludfjig ludfjig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR comment states that the this is a non-breaking change, but the rename GuestBinary -> GuestBlob is breaking, right? Also I do think the renaming is not intuitive for users who want to load a guest binary, when it's now called GuestBlob

Not strictly related to your PR, but I think we should consider not allowing passing blobs/binaries by "path", which would force the consumers to do their own reading of files, etc, and instead only take slices of u8.

I haven't fully thought about it, but maybe Uninitialized::new should take a trait instead?

@jprendes
Copy link
Contributor

IIRC, @simongdavies had an idea where the host could share host memory with the VM by adding pages to the VM's page table. Would something like that help in this case?

@danbugs
Copy link
Contributor Author

danbugs commented Jun 12, 2025

The PR comment states that the this is a non-breaking change, but the rename GuestBinary -> GuestBlob is breaking, right? Also I do think the renaming is not intuitive for users who want to load a guest binary, when it's now called GuestBlob

On the breaking-change point, you're right. Guess what I meant to say is that you don't necessarily have to pass a GuestEnvironment to create a new uninit sandbox because you can still create it off a binary alone.

Not strictly related to your PR, but I think we should consider not allowing passing blobs/binaries by "path", which would force the consumers to do their own reading of files, etc, and instead only take slices of u8.

If so, I can change GuestBlob to be just bytes and revert the re-name of the GuestBinary. I opted to re-name because they were the same and I felt like GuestBlob also encompassed a binary.

I haven't fully thought about it, but maybe Uninitialized::new should take a trait instead?

It does currently take a trait: env: impl Into<GuestEnvironment<'a>>,

@ludfjig
Copy link
Contributor

ludfjig commented Jun 12, 2025

I haven't fully thought about it, but maybe Uninitialized::new should take a trait instead?

It does currently take a trait: env: impl Into<GuestEnvironment<'a>>,

What do you think about letting Uninitialized::new take something like a &impl GuestEnvironment

pub trait GuestEnvironment {
    /// Get the guest binary
    fn guest_binary(&self) -> &[u8];

    /// Get optional memory blobs
    fn blobs(&self) -> Option<&[&[u8]]>;
}

Then we could implement it for exisiting GuestBinary, and also a new GuestBinaryWithBlob, etc, etc.

@danbugs
Copy link
Contributor Author

danbugs commented Jun 12, 2025

I haven't fully thought about it, but maybe Uninitialized::new should take a trait instead?

It does currently take a trait: env: impl Into<GuestEnvironment<'a>>,

What do you think about letting Uninitialized::new take something like a &impl GuestEnvironment

pub trait GuestEnvironment {
    /// Get the guest binary as a `GuestBinary`.
    fn guest_binary(&self) -> GuestBinary;

    /// Get optional memory blobs
    fn blob(&self) -> Option<&[&[u8]]>;
}

Then we could implement it for exisiting GuestBinary, and also a new GuestBinaryWithBlob, etc, etc.

I think I like impl Into<GuestEnvironment<'a>> a little better so we don't have to write empty implementations (e.g., fn blob for GuestBinary) and grow the trait and its deps if we want to support extra items in a guest environment, but I'm not strongly opinionated here. Open to change if you think this has other pros I might be missing 🙂

@ludfjig
Copy link
Contributor

ludfjig commented Jun 12, 2025

I haven't fully thought about it, but maybe Uninitialized::new should take a trait instead?

It does currently take a trait: env: impl Into<GuestEnvironment<'a>>,

What do you think about letting Uninitialized::new take something like a &impl GuestEnvironment

pub trait GuestEnvironment {
    /// Get the guest binary as a `GuestBinary`.
    fn guest_binary(&self) -> GuestBinary;

    /// Get optional memory blobs
    fn blob(&self) -> Option<&[&[u8]]>;
}

Then we could implement it for exisiting GuestBinary, and also a new GuestBinaryWithBlob, etc, etc.

I think I like impl Into<GuestEnvironment<'a>> a little better so we don't have to write empty implementations (e.g., fn blob for GuestBinary) and grow the trait and its deps if we want to support extra items in a guest environment, but I'm not strongly opinionated here. Open to change if you think this has other pros I might be missing 🙂

@simongdavies @jprendes @dblnz @andreiltd @syntactically, any opinions on the API of this PR?

Copy link
Contributor

@simongdavies simongdavies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good just a few questions

@simongdavies
Copy link
Contributor

IIRC, @simongdavies had an idea where the host could share host memory with the VM by adding pages to the VM's page table. Would something like that help in this case?

I don't recall what that might have been, but I think if @danbugs takes up my feedback about allowing the data to have RWX flags associated with it then if its R we could do an optimisation where its only allocated in the host once, that would require us to change the way we bounds check though so I think out of scope here.

We also talked about allowing a user to pass an vector of (address,len) into a sandbox for mapping, that may be a better way to do something like this, and it may fit better in a configuration than as a argument to the new function

@simongdavies
Copy link
Contributor

Not strictly related to your PR, but I think we should consider not allowing passing blobs/binaries by "path", which would force the consumers to do their own reading of files, etc, and instead only take slices of u8.

I think this all stemmed from us having in-proc and more specifically loadlib support in Windows, I think you are correct, I don't think we need it any longer ,not sure if now is the right time though...

@simongdavies
Copy link
Contributor

simongdavies commented Jun 12, 2025

any opinions on the API of this PR?

I quite like it as it is but not really that fussed either way, eventually we should bury all this behind a builder, that might be the right time to get rid of the paths

@danbugs danbugs force-pushed the load-extra-blob branch 2 times, most recently from 7371ebe to 96749f1 Compare June 12, 2025 23:06
@danbugs
Copy link
Contributor Author

danbugs commented Jun 12, 2025

I don't recall what that might have been, but I think if @danbugs takes up my feedback about allowing the data to have RWX flags associated with it then if its R we could do an optimisation where its only allocated in the host once, that would require us to change the way we bounds check though so I think out of scope here.

Added configurable flags to the new mem region 👍

@danbugs
Copy link
Contributor Author

danbugs commented Jun 12, 2025

I think this all stemmed from us having in-proc and more specifically loadlib support in Windows, I think you are correct, I don't think we need it any longer ,not sure if now is the right time though...

#614 ~ made an issue for this, @ludfjig / @simongdavies

@danbugs danbugs force-pushed the load-extra-blob branch 3 times, most recently from eace8a2 to fcddae7 Compare June 13, 2025 22:19
ludfjig
ludfjig previously approved these changes Jun 13, 2025
danbugs added 4 commits June 13, 2025 23:58
The GuestEnvironment struct contains two blobs of data. One identifiable as a guest binary,
and one undifferentiated guest blob. This GuestEnvironment is now used to create a new
sandbox in place of just a guest binary. There are TryFrom impls to be able to convert from
a guest binary to a GuestEnvironment, so this isn't a breaking change.

Signed-off-by: danbugs <[email protected]>
+ if guest blob is provided, we now write it to shared mem when creating a sandbox

Signed-off-by: danbugs <[email protected]>
W/ this, now the guest can access that memory region.

Signed-off-by: danbugs <[email protected]>
… data

+ modified guest and guest_bin libs for it too

Signed-off-by: danbugs <[email protected]>
@danbugs danbugs merged commit 4633add into hyperlight-dev:main Jun 14, 2025
32 checks passed
@danbugs danbugs deleted the load-extra-blob branch June 23, 2025 16:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement For PRs adding features, improving functionality, docs, tests, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants