Skip to content

Allow rewriting and reading of user-data #713

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

qbradley
Copy link
Contributor

In addition to initializing user data at guest creation time, I would also like to be able to update the user data from the host without resetting my call context, and if the area is writable from the guest, I would like to be able to read from it as well.

Changes

  • Add rewrite_init_data and read_init_data methods to MemMgrWrapper and MultiUseGuestCallContext
  • Expose hl_get_user_memory_ptr and hl_get_user_memory_size for C guests to access user data

@qbradley qbradley force-pushed the rewrite_init_data branch from 91c8f1e to 335e9c9 Compare July 15, 2025 00:28
@syntactically
Copy link
Member

Is the performance motivation for this basically due to the facts that (a) function parameters always get copied more than they ought to right now and/or (b) the lifecycle of the function parameters doesn't match the lifecycle of the data that you need, since currently it is tied directly to the call stack? If so, I would like to take a little bit of a look into whether the same need can be served by removing some of the copies (we have some preliminary work in #444) and changing the call ABI to be more flexible about how function parameters/returns are allocated (roughly: the guest can allocate buffers for these on its own, and manage their lifecycle itself), which is also a requirement for supporting native async in the host/guest ABI. It would be nice to eventually replace the entire special-case "init data" at sandbox initialization with a more generic mix of features (memory mapping for zerocopy readonly data, this kind of thing for longer-lived read-write data where you can hand off ownership between the host and guest, etc).

The whole call context abstraction is hopefully going away in #697, so it also might be worth waiting to move on this until that is merged?

@qbradley
Copy link
Contributor Author

I do not know with certainty that extra copies is the only cause of the overhead, but per byte cost of parameters is currently high, especially on /dev/mshv. In my microbencharks, I see a guest call with 16 bytes of parameters taking around 100 microseconds, while with 32KB of parameters, it takes over 200 microseconds. However by copying the bulk data using init-data using above, 32KB takes around 112 microseconds.

With the above, I still need an extra copy because on the rust side I prepare a buffer, then copy it into the user-memory area using rewrite_init_data(). In the guest side, I access the pointer to the user-memory to interpret the data. Ideally on the host side I could also prepare the input buffer directly to user-memory rather than using an intermediate buffer.

I could imagine a shared-memory API that allows me to map memory into the host and the guest and then I would not need this, but meanwhile this change gets me a large step in the direction of high throughput input/output through my guest.

@syntactically
Copy link
Member

syntactically commented Jul 15, 2025

I do not know with certainty that extra copies is the only cause of the overhead, but per byte cost of parameters is currently high, especially on /dev/mshv. In my microbencharks, I see a guest call with 16 bytes of parameters taking around 100 microseconds, while with 32KB of parameters, it takes over 200 microseconds. However by copying the bulk data using init-data using above, 32KB takes around 112 microseconds.

That makes sense. It would be interesting to see how that shakes out with some of the changes to parameters that we are working on right now. In addition to the PR that I linked above, I am looking a little bit at making parameter buffer allocation happen in the guest with a more sensible ABI as part of something else, and I think a general push to improve parameter passing to the point of being useful is underway.

With the above, I still need an extra copy because on the rust side I prepare a buffer, then copy it into the user-memory area using rewrite_init_data(). In the guest side, I access the pointer to the user-memory to interpret the data. Ideally on the host side I could also prepare the input buffer directly to user-memory rather than using an intermediate buffer.

We would probably need to be a little careful with the API design on this before doing anything like that in the general case, because of the many footguns around toctou, etc. which come from exposing an API that reads incrementally from a region which the guest is still able to mutate. So I would like to look at achieving single-copy first, and then look into ways to add various forms of zerocopy that are a bit safer than just "here's a pointer to a buffer that the guest is mutating, parse it as you will".

@qbradley
Copy link
Contributor Author

For our use case, we do not modify the "shared" memory while the guest is running. The host writes to the shared memory, then makes a call into the guest who examines the memory and then updates it and returns to the host. It is my understanding (please correct me if I am wrong) that you no longer use a separate thread to run the guest. The host then examines the shared memory. For our specific use case, I think this mitigates a lot of the potential problems. I can understand how a general API has to handle everything someone might try to do. Another assumption I have is that a hypercall to the guest should be equivalent to a full fence, like a syscall is, but please also correct me if I am wrong on that one.

@syntactically
Copy link
Member

For our use case, we do not modify the "shared" memory while the guest is running. The host writes to the shared memory, then makes a call into the guest who examines the memory and then updates it and returns to the host. It is my understanding (please correct me if I am wrong) that you no longer use a separate thread to run the guest. The host then examines the shared memory. For our specific use case, I think this mitigates a lot of the potential problems.

In that case, I agree that zerocopy is probably safer, but I just want to be cautious with adding API to make sure we don't do something that encourages unsafe usage in general, beyond this one application. Out of curiosity, is your data structured in any interesting ways processed by the host side, or is it just bulk data?

Another assumption I have is that a hypercall to the guest should be equivalent to a full fence, like a syscall is, but please also correct me if I am wrong on that one.

I think we are making the same assumption right now, and it is at least generally true.

@qbradley
Copy link
Contributor Author

It is a moderately complex structure that must be parsed. It is a sequence of variable length structures defined by initial tag byte.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants