-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: add option to force guests to use swiotlb in dedicated memory region for all I/O #5108
Draft
roypat
wants to merge
32
commits into
firecracker-microvm:feature/secret-hiding
Choose a base branch
from
roypat:secret-freedom-v4
base: feature/secret-hiding
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
WIP: add option to force guests to use swiotlb in dedicated memory region for all I/O #5108
roypat
wants to merge
32
commits into
firecracker-microvm:feature/secret-hiding
from
roypat:secret-freedom-v4
+1,530
−797
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
8fe8462
to
75f4ee1
Compare
5573725
to
ad06919
Compare
9462514
to
39cce06
Compare
The explicit calls to GuestMemoryMmap::create ended up being touched in like 5 of the following commits, so let's reduce the churn by putting it into a function. Signed-off-by: Patrick Roy <[email protected]>
39cce06
to
2168468
Compare
Have all of them operate on some sort of list of (GuestAddress, usize) tuples, for consistency. Signed-off-by: Patrick Roy <[email protected]>
e11ee8e
to
7635ee8
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## feature/secret-hiding #5108 +/- ##
=========================================================
- Coverage 83.06% 82.97% -0.09%
=========================================================
Files 250 250
Lines 26888 27121 +233
=========================================================
+ Hits 22334 22504 +170
- Misses 4554 4617 +63
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Avoid some .map_err() calls using #[from] and the auto-From::from calling of the question mark operator. Signed-off-by: Patrick Roy <[email protected]>
When taking a diff snapshot, Firecracker resets the dirty bitmap of each guest memory region if the region itself was successfully dumped to the snapshot memory file. This is wrong, because a later region might fail, and then some regions will have their dirty bitmap reset, yet no snapshot was successfully taken, meaning a failure during diff snapshot taking will forever corrupt the dirty bitmap. Fix this by only reset the dirty bitmap of all regions after (and if!) all regions were successfully dumped to the file. Signed-off-by: Patrick Roy <[email protected]>
9aa4b62
to
9febf6c
Compare
Otherwise it's impossible to figure out what precisely went wrong during queue restoration. Signed-off-by: Patrick Roy <[email protected]>
This allows having regions that start somewhere other than guest physical address 0. Useful in case not all regions are allocated at once, and each batch needs to be placed after all already allocated regions. Since the resulting code has some intricacies (and I actually got it wrong the first time around), write a kani harness to validate that we're computing the regions correctly. Signed-off-by: Patrick Roy <[email protected]>
This way the non-snapshot path doesnt have to deal with passing in `None`. Signed-off-by: Patrick Roy <[email protected]>
Only enabling it when a balloon device is present doesn't really gain us anything, because the only time this feature is checked in the kernel is as part of madvise handling - but without balloon devices we wont ever call madvise, so it makes no difference whether we enabled the feature or not (it is unconditionally available since linux 4.10). Signed-off-by: Patrick Roy <[email protected]>
Being able to use the new GuestRegionCollection will make things a lot easier. Signed-off-by: Patrick Roy <[email protected]>
Use a special KvmRegion type, which allows us to manage kvm_userspace_memory_region2 inside of GuestRegionCollection. This means we can stop using .zip(0u32..) to determine kvm slot indices, which doesnt work anymore when multiple GuestMemoryMmaps are around, which have no order imposed on them wrt which one is initialized first. Signed-off-by: Patrick Roy <[email protected]>
Add a mem_config parameter to the PUT/PATCH /machine-config endpoints, which allows configuration of the size of a swiotlb region to use. Signed-off-by: Patrick Roy <[email protected]>
936dcad
to
0a21c54
Compare
Add a special `GuestMemoryMmap` to `struct Vm` that has the semantics that if its not empty, it represents the swiotlb area in the guest. Have all virtio devices only get access to swiotlb regions if they exist, only falling back onto having access to all of guest memory otherwise. This means that in case of swiotlb regions being set up, virtio devices cannot access generic memory, so if the guest decides to (incorrectly) place I/O buffers outside of swiotlb we'll get a somewhat meaningful error of "invalid guest address" (or similar), instead of random things failing further down the line in case generic memory isn't accessible by firecracker/the host kernel for some reason (e.g. if its guest_memfd backed in the future). When a VM with swiotlb regions is being snapshotted, the regions are simply concatenated to the end of the snapshot memory file. This changes the snapshot memory file format in the sense that regions in the file are not necessarily sorted by guest address anymore. Signed-off-by: Patrick Roy <[email protected]>
If swiotlb is requested through the API, this function will allocate a single contiguous region intended for swiotlb use. It needs to be continguous because we must be able to describe it using a single memory range in FDT, as devices cannot be assigned to reserved memory consisting of multiple regions (e.g. FDT assignment is to memory regions, not to #reserved-memory nodes). While we're at it, always use anon memory for the "normal" part of guest memory if swiotlb is enabled, as if swiotlb is enabled we know that vhost devices will never need to access that part of memory. Signed-off-by: Patrick Roy <[email protected]>
Allocate the swiotlb region if requested via the /machine-config API endpoint, and register it to the VM/KVM. Signed-off-by: Patrick Roy <[email protected]>
describe the swiotlb region as a restricted-mem node in the flattened device tree, and force all virtio devices to use it for their vrings and buffers by setting their `memory-region` property. Signed-off-by: Patrick Roy <[email protected]>
Have all virtio devices off VIRTIO_F_PLATFORM_ACCESS, which communicates to the guest that DMA APIs should be used for allocating vrings and virtio buffers. Since we have created a dedicated swiotlb region in the geust and assigned all virtio devices to this, it means that all such allocations will go through the swiotlb region. Signed-off-by: Patrick Roy <[email protected]>
0a21c54
to
3e042c6
Compare
Support restoring VMs with swiotlb regions. For this, untangle the uffd handshake from the actual restoration of memory regions, as we first need to restore and register to the Vm _all_ memory regions, before we can then send a single handshake containing both normal and swiotlb regions to the Uffd handler. While we're here, significantly simplify the jungle of error types. Signed-off-by: Patrick Roy <[email protected]>
This will make it easier to validate in an integration test that virt queues are actually placed in the swiotlb region, as a sort of sanity check that on Firecracker's side the swiotlb regions are actually passed to the virtio devices, and that the guest is honoring the request to place virtio stuff into the swiotlb regions. Signed-off-by: Patrick Roy <[email protected]>
This is needed for the guest to correctly parse the restricted-memory nodes that firecracker now puts into FDT. Signed-off-by: Patrick Roy <[email protected]>
these contain 6.1 guest kernels that have CONFIG_DMA_RESTRICTED_MEM enabled and can thus parse swiotlb region descriptions from FDT. Signed-off-by: Patrick Roy <[email protected]>
If the CI artifacts dont contain old firecracker releases, still succeed at setting downloading them. Signed-off-by: Patrick Roy <[email protected]>
On aarch64, additionally parametrize our throughput performance tests (network, block and vsock) by swiotlb size, to get a feeling for the performance impact bounce buffering will have. Try out different sizes of swiotlb region to see whether we can identify a cut off. Signed-off-by: Patrick Roy <[email protected]>
Add an integration test on ARM that configures an in-guest swiotlb region to which all virtio devices should be bound. Asserts the guest sees the swiotlb region, and also that block and net I/O work (indirectly through loading of rootfs during boot and by running commands via SSH). Also asserts that all virtio queue components have been placed into swiotlb, by parsing firecracker log messages. Also add a test for snapshotting VMs with swiotlb regions, which just checks the network device after restoration to sanity check that the swiotlb region was correctly restored. Signed-off-by: Patrick Roy <[email protected]>
3e042c6
to
a4e86a0
Compare
10 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Changes
Add a new API field to /machine-config,
initial_swiotlb_size
, which causes Firecracker to carve out a region of guest memory of the specified size, mark is as dma-restricted in FDT, and force all virtio devices to use this memory for their vrings and buffers.Reason
With secret hiding, I/O needs to be handled by swiotlb, as we cannot do I/O into guest memory that is direct map removed
License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md
.PR Checklist
tools/devtool checkstyle
to verify that the PR passes theautomated style checks.
how they are solving the problem in a clear and encompassing way.
in the PR.
CHANGELOG.md
.Runbook for Firecracker API changes.
integration tests.
TODO
.rust-vmm
.