-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
virtio_balloon: add support for MMIO devices #2081
Conversation
This function will be reused when adding support for configuration change handlers.
It is now possible for a virtio MMIO device driver to register a handler function that is invoked when the device configuration changes.
This preprocessor definition includes all the features supported by the driver. No functional change.
The virtio memory balloon device in AWS Firecracker does not support more than 256 pages in a single inflate descriptor. 256 pages correspond to 1 MB of memory; therefore, change the balloon allocation unit size to 1 MB, in preparation for adding support for Firecracker balloon devices.
This allows the virtio memory balloon driver to work with AWS Firecracker, which instantiates MMIO devices.
Thank you for working on this! How did you test it? I can't seem to get it to do much (though it doesn't error anymore). I don't expect the behaviour to be the same as Linux's, but from previous experience with the ballooning device, it behaved like this:
If there's nothing preventing freeing the pages inside the microvm, then that's all there is to it. If it can't free all the requested pages, then the kernel starts logging balloon-related information. I believe it applies pressure for these pages to be freed and future allocations by the kernel are severely affected (read: impossible and everything slows down to a crawl). Testing this branch locally, I'm unable to get any memory freed from the host's perspective. Even inflating the balloon to the total memory allowed in the firecracker doesn't appear to do much. I'm running a Deno app and I suspect a lot of memory should be free-able. It's a little hard to tell because I'm unable to gather RSS from the host: let mem = Deno.memoryUsage();
console.log(`rss: ${mem.rss}, heap total: ${mem.heapTotal}, heap used: ${mem.heapUsed}, external: ${mem.external}`);
I'm also using the
These numbers are with the balloon inflated to 512MB and appears to have no effect. Inflating it to other numbers also didn't appear to change anything. Firecracker process from the host side:
Somehow the memory inside the VM is reported as more than outside which feels odd to me, but I suspect there's some weirdness regarding how memory is reported from inside the guest. |
As far as I understand,
in a Firecracker VM in 512 MB of RAM and the balloon initially empty, at the beginning I see figures such as below:
The value for used memory above is what is needed to run the application process.
The amount of used memory is now much higher, because most of it is taken by the balloon (and therefore cannot be used by the guest). The kernel tries to keep a "safe" amount of free memory (64 MB in this setup) so as not to risk running out of memory for its internal operations; that's why the balloon is not inflated to take all the available RAM.
which means that the balloon has been deflated and the memory it was holding is now available again for the guest to use. I don't know what Firecracker does with the memory in the balloon (e.g. whether it releases the memory to the host OS, or does something else), so I don't know if we can gather any relevant information from the Firecracker process memory stats. |
Ah yes, sorry, I was just including the Deno output as a data point, but wasn't exactly relying on it. I found it interesting it couldn't report on RSS at all. Is that a limitation of nanos? Might just be an incompatible implementation in Deno. I'll have to test again on Linux, but I don't recall the balloon showing as memory usage. My own memory could be wrong though! It would make sense that it'd show as usage.
The main purpose, in my opinion, of the balloon is that it would reclaim memory on the host. Firecracker does not reclaim allocated memory. If the guest bursts to 300 / 512MB and then goes back down to 100MB, then the host will show the process as using 300MB RSS. Inflating the balloon reclaims the now-unused memory from the host and lowers the RSS of the firecracker process. I started my firecracker with
Then I inflated the balloon to 300 MiB and saw:
Then I deflated it back to 0 and saw:
The host's RSS for firecracker did not move (or it did by maybe 1-3MB) through these various operations. I short-circuited my program to allocate 200MB of random bytes of memory (allocating zeroes did nothing for some reason, maybe as an optimization in
Looking from the host, it did use ~238MB of RSS. I don't know why that figure would be smaller than what's reported inside the guest, but I can ignore that for now :) After 30 seconds, I deallocated the
So far so good! As expected, by my own mental model of Firecracker, the RSS for the process on the host remained stable at 238MB. I then inflated the balloon to 400MiB and saw:
Which would fit what you've said in your last comment. The host has reclaimed the memory and is sitting at around 57MB. Again, that figure is a bit surprising to me because it's lower than what the guest reports. Now, given all of that, it seems like memory ballooning does work as expected! My bad. This is leading me to think that the base memory used by my own app is high and unreclaimable by the balloon device (and I assume: the kernel). It seems like running the same Deno app on then host uses roughly 128MB (RSS). So maybe the 90MB based I got in my program + 128MB sounds about right. I was hoping my own program would be closer to 128MB and have practically zero overhead. My program is 103MB and Edit: Running my app on my Linux host directly uses only about 128MB of RSS memory as well. In line with Deno. At this point i'm fairly sure it's an interaction with the My app image built with Do you have any idea why RSS reported from the host for the firecracker process could be lower than the memory reported inside the host? In our own experience, that's never the case. |
Deno retrieves the RSS from the /proc/self/statm file, which doesn't exist in Nanos, that's why the reported RSS value is 0.
Usually when a program allocates a large amount of memory, what happens under the hood is that an mmap() syscall is invoked, which allocates the requested amount of virtual address space but does not map it directly to physical RAM, so RAM space is not allocated right away; the actual allocation (which shows up in the used memory) happens only when the program accesses the memory (in your case, when the memory is filled with random values). Allocating zeros does not require accessing the memory (because mmap()ed memory is always initialized with zeros), that's why it doesn't show up in the used memory.
Nanos internally allocates guest physical memory for many different purposes, but not all of that memory may be used immediately, and only the memory that has actually been used (i.e. read from or written to) shows up in Firecracker's RSS. For example, Nanos has internal heaps that use different caches for different allocation sizes; if e.g. one 32-byte allocation is made inside the kernel, the relevant cache allocates a 2-MB region of physical memory, but only a fraction of that memory is going to be used immediately: in this scenario, Nanos reports the entire 2 MB as in-use memory (because it has been allocated in the guest physical memory and is therefore unavailable for other uses), while only the fraction that's actually used (which could be as low as 4 KB) shows up in Firecracker's RSS. Another example is the set of buffers pre-allocated by the network interface driver for reception of network packets: all the memory occupied by the buffers is considered in-use memory by Nanos, but only the buffers that have actually been used for receiving incoming network packets show up in the RSS. I'm not sure I fully understood the issue you mentioned when comparing the RSS of your app when run directly on the host with the RSS of Firecracker when running your app under Nanos, but it seems to me you are considering the used memory reported by the guest (e.g. 90 MB) as separate from Firecracker's RSS, and you are adding the two figures. In reality, the two figures refer to the same memory (if we ignore the additional memory needed by the VMM itself to create and run the guest VM), and their differences can be explained with what I wrote above. (That's when the memory balloon is empty; if you inflate the balloon, its memory is accounted for in the guest used memory but not in Firecracker's RSS, of course.) |
Thanks for the explanations. This is insightful.
I'm essentially trying to measure the overhead of I haven't yet had a chance to test this using Linux, but i suspect there's overhead there as well. I just don't expect it to be that much. By using the balloon my hope was to reclaim as much as possible. It seems like there's more unreclaimable memory when using Deno vs. just allocating 200MB of random bytes on the heap in a Rust program. I'm saying this because of my test comparing the RSS of firecracker before and after allocating and before and after inflating the balloon. I would expect to be able to reclaim a lot more memory on the host for an app that uses ~120MB of RSS. From your explanation, it sounds like there's some form of allocation that's not being deallocated when the balloon expands. The reason I am specifically testing for low memory footprint is that I want to fit as many of these as possible on a host. The difference between 128MB and 200MB+ of memory allocated for a firecracker process is big, almost halving the number that can fit on a host.
My app should only have 1 TCP listener and I'm running Deno with I did notice that making a single request will increase the memory usage by a little bit. is there a way to inspect what "kinds" of allocations (and how much) My confusion comes from the fact that a test program's memory, allocating 200MB of random bytes, can be fully reclaimed when using the balloon device. But Deno's or V8's memory can't be reclaimed by the host in the same way. So I imagine it has something to do with the "kinds" of allocations it is doing. On Linux, I suspect the balloon creates memory pressure and the kernel drops buffers and caches more aggressively. I'm wondering if
How do I measure that? The RSS stat is not available from inside. As far as I can tell, there's roughly 2x overhead in memory by running my app in a Firecracker with the |
You can get memory-related info from the guest via the virtio-balloon statistics (see https://github.com/firecracker-microvm/firecracker/blob/main/docs/ballooning.md#virtio-balloon-statistics).
I was referring to the RSS of the firecracker process, not the RSS of the deno process inside the guest. So it seems you are seeing ~100MB extra RSS in firecracker compared to running your deno app directly on the host; and it seems you are unable to reclaim that memory via the virtio-ballon device. This sounds like something that could be improved. |
That does seem reasonable, yes. I'll try and reproduce the memory bloat with my simple app. The one that does appear to be problematic is a
Then, if you're using Edit: I couldn't reproduce high memory usage with the simple JS app which leads me to think there's something about how memory is allocated with the more complex app. |
I've finally spent the time to test this with the Linux kernel to compare. Running the same executable binary on Linux 5.15 gives me similar memory usage once the app boot (a little over 200MB), but using the balloon will bring down the host's firecracker RSS to the expected levels. When I inflate the balloon to 400MB (more than I should for this app) out of the 512MB available to the VM, I see this:
The host's Then I relieved the pressure by deflating the balloon to 0 MB and it stopped logging and I saw the I tested using I can provide the rootfs (from which the init can be extracted), linux kernel used and firecracker config if that helps reproducing the issue. My current theories are:
I suspect if nanos behaved closer to the Linux's implementation for the balloon device, the total overhead of the VM + kernel would be lower than using Linux. |
With #2085, Nanos is more effective in allowing the hypervisor to release host memory. @jeromegn now if I run your executable with Firecracker and Nanos and I inflate the balloon, Firecracker's RSS first decreases by a couple of MB (which correspond to non-file-backed memory that has first been allocated and then deallocated by the executable and/or the kernel); then, if I inflate more, the RSS decreases by another ~40 MB, which is the result of caches being dropped (you can see the difference in the "disk_caches" value returned when you query the balloon statistics). |
This allows the virtio memory balloon driver to work with AWS Firecracker, which instantiates MMIO devices.