Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: add option to force guests to use swiotlb in dedicated memory region for all I/O #5108

Draft
wants to merge 32 commits into
base: feature/secret-hiding
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
8abcefe
refactor(test): add helper function for creating mem in vhost tests
roypat Mar 27, 2025
7829cb4
refactor: Have all memory creation method take similar arguments
roypat Jan 31, 2025
0798ea2
refactor: introduce struct VmCommon
roypat Mar 21, 2025
b9ee939
refactor: move check for not exceeding memslot limit into `struct Vm`
roypat Mar 21, 2025
5380076
refactor: store GuestMemoryMmap inside `struct Vm`
roypat Mar 21, 2025
7084005
refactor: load kernel/initrd after create_vmm_and_vcpus()
roypat Mar 21, 2025
b117f32
refactor: build GuestMemoryMmap objects incrementally
roypat Mar 21, 2025
5e0cf5c
refactor: move GuestMemoryState into VmState
roypat Mar 25, 2025
2e2d421
refactor: move dirty bitmap functions into struct Vm
roypat Mar 26, 2025
639b7ed
refactor: hoist post-snapshot vring re-dirtying
roypat Mar 26, 2025
b316efb
refactor: move snapshot_memory_to_file into struct Vm
roypat Mar 26, 2025
78ffa39
refactor: #[from]-ify CreateSnapshotError
roypat Mar 28, 2025
fc65a89
fix: only reset dirty bitmap if all regions were dumped successfully
roypat Mar 28, 2025
6ceec47
fix: print inner error in case of PersistError::QueueConstruction
roypat Mar 28, 2025
96e4f99
add offset argument to `arch_regions`
roypat Jan 29, 2025
8d319a7
refactor: drop uffd parameter from create_vmm_and_vcpus
roypat Mar 25, 2025
d07e566
uffd: unconditionally enable UFFD_FEATURE_EVENT_REMOVE
roypat Mar 25, 2025
55f7efb
update to in-dev version of vm-memory
roypat Mar 27, 2025
bd2bb6d
Keep track of kvm memslot index inside GuestMemoryMmap
roypat Mar 27, 2025
9c4bdc1
add mem_config parameter to machine-config endpoint (aarch64)
roypat Mar 20, 2025
0b345b3
add concept of "swiotlb regions" to `struct Vm`
roypat Mar 21, 2025
6c2ae13
add function to VmResources for allocating I/O memory
roypat Mar 21, 2025
6356e24
allocate and register swiotlb mem if initial_swiotlb_size != 0
roypat Mar 24, 2025
2a7a715
arm: describe swiotlb region in FDT
roypat Mar 24, 2025
3196935
have virtio devices offer VIRTIO_F_ACCESS_PLATFORM if swiotlb
roypat Mar 26, 2025
5b3708f
Snapshot restore for swiotlb regions
roypat Mar 26, 2025
c2680b3
add logging about virtq and swiotlb region placement
roypat Mar 28, 2025
e89e1e7
ci: enable CONFIG_DMA_RESTRICTED_POOL on 6.1 aarch64 guest kernels
roypat Mar 24, 2025
f02714c
ci: point towards special secret hiding ci artifacts
roypat Mar 24, 2025
eda7979
ci: dont fail downloading artifacts if no firecracker binaries exist
roypat Mar 24, 2025
f69bfe4
test: run throughput perf tests with swiotlb enabled
roypat Mar 24, 2025
a4e86a0
test: ensure swiotlb works
roypat Mar 26, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,9 @@ and this project adheres to
the `SendCtrlAltDel` command not working for ACPI-enabled guest kernels, by
dropping the i8042.nopnp argument from the default kernel command line
Firecracker constructs.
- #[???](???): Fix failure during diff snapshot creation corrupting internal
state about which pages were dirtied since the last snapshot, meaning no
further diff snapshots could ever be taken, even if the error is transient.

## [1.11.0]

Expand Down
6 changes: 2 additions & 4 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 4 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -32,3 +32,7 @@ strip = "none"

[profile.bench]
strip = "debuginfo"

[patch.crates-io]
linux-loader = { git = "https://github.com/roypat/linux-loader", branch = "vm-memory-update" }
vm-memory = { git = "https://github.com/roypat/vm-memory", branch = "generic-region-container" }
1 change: 1 addition & 0 deletions resources/guest_configs/ci.config
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ CONFIG_SQUASHFS_ZSTD=y
# aarch64 only TBD split into a separate file
CONFIG_DEVMEM=y
# CONFIG_ARM64_ERRATUM_3194386 is not set
CONFIG_DMA_RESTRICTED_POOL=y
# Needed for CTRL+ALT+DEL support
CONFIG_SERIO=y
CONFIG_SERIO_I8042=y
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,7 @@ mod tests {
let expected_config = MachineConfigUpdate {
vcpu_count: Some(8),
mem_size_mib: Some(1024),
mem_config: Some(Default::default()),
smt: Some(false),
cpu_template: None,
track_dirty_pages: Some(false),
Expand All @@ -140,6 +141,7 @@ mod tests {
let expected_config = MachineConfigUpdate {
vcpu_count: Some(8),
mem_size_mib: Some(1024),
mem_config: Some(Default::default()),
smt: Some(false),
cpu_template: Some(StaticCpuTemplate::None),
track_dirty_pages: Some(false),
Expand All @@ -161,6 +163,7 @@ mod tests {
let expected_config = MachineConfigUpdate {
vcpu_count: Some(8),
mem_size_mib: Some(1024),
mem_config: Some(Default::default()),
smt: Some(false),
cpu_template: None,
track_dirty_pages: Some(true),
Expand All @@ -186,6 +189,7 @@ mod tests {
let expected_config = MachineConfigUpdate {
vcpu_count: Some(8),
mem_size_mib: Some(1024),
mem_config: Some(Default::default()),
smt: Some(false),
cpu_template: Some(StaticCpuTemplate::T2),
track_dirty_pages: Some(true),
Expand Down Expand Up @@ -213,6 +217,7 @@ mod tests {
let expected_config = MachineConfigUpdate {
vcpu_count: Some(8),
mem_size_mib: Some(1024),
mem_config: Some(Default::default()),
smt: Some(true),
cpu_template: None,
track_dirty_pages: Some(true),
Expand Down
3 changes: 1 addition & 2 deletions src/vmm/benches/memory_access.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@
// SPDX-License-Identifier: Apache-2.0

use criterion::{BatchSize, Criterion, criterion_group, criterion_main};
use vm_memory::GuestMemory;
use vmm::resources::VmResources;
use vmm::vmm_config::machine_config::{HugePageConfig, MachineConfig};

Expand All @@ -14,7 +13,7 @@ fn bench_single_page_fault(c: &mut Criterion, configuration: VmResources) {
// Get a pointer to the first memory region (cannot do `.get_slice(GuestAddress(0),
// 1)`, because on ARM64 guest memory does not start at physical
// address 0).
let ptr = memory.iter().next().unwrap().as_ptr();
let ptr = memory.first().unwrap().as_ptr();

// fine to return both here, because ptr is not a reference into `memory` (e.g. no
// self-referential structs are happening here)
Expand Down
17 changes: 7 additions & 10 deletions src/vmm/src/acpi/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -188,7 +188,9 @@ mod tests {
use crate::acpi::{AcpiError, AcpiTableWriter};
use crate::arch::x86_64::layout::{SYSTEM_MEM_SIZE, SYSTEM_MEM_START};
use crate::builder::tests::default_vmm;
use crate::test_utils::arch_mem;
use crate::device_manager::resources::ResourceAllocator;
use crate::utils::u64_to_usize;
use crate::vstate::vm::tests::setup_vm_with_memory;

struct MockSdt(Vec<u8>);

Expand All @@ -215,7 +217,7 @@ mod tests {
// A mocke Vmm object with 128MBs of memory
let mut vmm = default_vmm();
let mut writer = AcpiTableWriter {
mem: &vmm.guest_memory,
mem: vmm.vm.guest_memory(),
resource_allocator: &mut vmm.resource_allocator,
};

Expand Down Expand Up @@ -263,15 +265,10 @@ mod tests {
// change in the future.
#[test]
fn test_write_acpi_table_small_memory() {
let mut vmm = default_vmm();
vmm.guest_memory = arch_mem(
(SYSTEM_MEM_START + SYSTEM_MEM_SIZE - 4096)
.try_into()
.unwrap(),
);
let (_, vm) = setup_vm_with_memory(u64_to_usize(SYSTEM_MEM_START + SYSTEM_MEM_SIZE - 4096));
let mut writer = AcpiTableWriter {
mem: &vmm.guest_memory,
resource_allocator: &mut vmm.resource_allocator,
mem: vm.guest_memory(),
resource_allocator: &mut ResourceAllocator::new().unwrap(),
};

let mut sdt = MockSdt(vec![0; usize::try_from(SYSTEM_MEM_SIZE).unwrap()]);
Expand Down
47 changes: 43 additions & 4 deletions src/vmm/src/arch/aarch64/fdt.rs
Original file line number Diff line number Diff line change
Expand Up @@ -10,20 +10,23 @@
use std::fmt::Debug;

use vm_fdt::{Error as VmFdtError, FdtWriter, FdtWriterNode};
use vm_memory::GuestMemoryError;
use vm_memory::{GuestMemoryError, GuestMemoryRegion};

use super::super::DeviceType;
use super::cache_info::{CacheEntry, read_cache_config};
use super::gic::GICDevice;
use crate::device_manager::mmio::MMIODeviceInfo;
use crate::devices::acpi::vmgenid::{VMGENID_MEM_SIZE, VmGenId};
use crate::initrd::InitrdConfig;
use crate::vstate::memory::{Address, GuestMemory, GuestMemoryMmap};
use crate::vstate::memory::{Address, GuestMemory, GuestMemoryMmap, KvmRegion};

// This is a value for uniquely identifying the FDT node declaring the interrupt controller.
const GIC_PHANDLE: u32 = 1;
// This is a value for uniquely identifying the FDT node containing the clock definition.
const CLOCK_PHANDLE: u32 = 2;
/// Unique identifier for the /reserved-memory node describing the memory region the guest
/// is allowed to use for swiotlb
const IO_MEM_PHANDLE: u32 = 3;
// You may be wondering why this big value?
// This phandle is used to uniquely identify the FDT nodes containing cache information. Each cpu
// can have a variable number of caches, some of these caches may be shared with other cpus.
Expand Down Expand Up @@ -56,8 +59,10 @@
}

/// Creates the flattened device tree for this aarch64 microVM.
#[allow(clippy::too_many_arguments)]
pub fn create_fdt(
guest_mem: &GuestMemoryMmap,
swiotlb_region: Option<&KvmRegion>,
vcpu_mpidr: Vec<u64>,
cmdline: CString,
device_info: &HashMap<(DeviceType, String), MMIODeviceInfo>,
Expand All @@ -83,7 +88,7 @@
// containing description of the interrupt controller for this VM.
fdt_writer.property_u32("interrupt-parent", GIC_PHANDLE)?;
create_cpu_nodes(&mut fdt_writer, &vcpu_mpidr)?;
create_memory_node(&mut fdt_writer, guest_mem)?;
create_memory_node(&mut fdt_writer, guest_mem, swiotlb_region)?;
create_chosen_node(&mut fdt_writer, cmdline, initrd)?;
create_gic_node(&mut fdt_writer, gic_device)?;
create_timer_node(&mut fdt_writer)?;
Expand Down Expand Up @@ -208,7 +213,11 @@
Ok(())
}

fn create_memory_node(fdt: &mut FdtWriter, guest_mem: &GuestMemoryMmap) -> Result<(), FdtError> {
fn create_memory_node(
fdt: &mut FdtWriter,
guest_mem: &GuestMemoryMmap,
swiotlb_region: Option<&KvmRegion>,
) -> Result<(), FdtError> {
// See https://github.com/torvalds/linux/blob/master/Documentation/devicetree/booting-without-of.txt#L960
// for an explanation of this.

Expand All @@ -220,9 +229,13 @@
// The reason we do this is that Linux does not allow remapping system memory. However, without
// remap, kernel drivers cannot get virtual addresses to read data from device memory. Leaving
// this memory region out allows Linux kernel modules to remap and thus read this region.
//
// The generic memory description must include the range that we later declare as a reserved
// carve out.
let mem_size = guest_mem.last_addr().raw_value()
- super::layout::DRAM_MEM_START
- super::layout::SYSTEM_MEM_SIZE
+ swiotlb_region.map(|r| r.len()).unwrap_or(0)
+ 1;
let mem_reg_prop = &[
super::layout::DRAM_MEM_START + super::layout::SYSTEM_MEM_SIZE,
Expand All @@ -233,6 +246,25 @@
fdt.property_array_u64("reg", mem_reg_prop)?;
fdt.end_node(mem)?;

if let Some(region) = swiotlb_region {
let rmem = fdt.begin_node("reserved-memory")?;
fdt.property_u32("#address-cells", ADDRESS_CELLS)?;
fdt.property_u32("#size-cells", SIZE_CELLS)?;
fdt.property_null("ranges")?;
let dma = fdt.begin_node("bouncy_boi")?;
fdt.property_string("compatible", "restricted-dma-pool")?;
fdt.property_phandle(IO_MEM_PHANDLE)?;
fdt.property_array_u64("reg", &[region.start_addr().0, region.len()])?;
fdt.end_node(dma)?;
fdt.end_node(rmem)?;

Check warning on line 259 in src/vmm/src/arch/aarch64/fdt.rs

View check run for this annotation

Codecov / codecov/patch

src/vmm/src/arch/aarch64/fdt.rs#L250-L259

Added lines #L250 - L259 were not covered by tests

log::info!(
"Placed swiotlb region at [{}, {})",
region.start_addr().0,
region.start_addr().0 + region.len()

Check warning on line 264 in src/vmm/src/arch/aarch64/fdt.rs

View check run for this annotation

Codecov / codecov/patch

src/vmm/src/arch/aarch64/fdt.rs#L261-L264

Added lines #L261 - L264 were not covered by tests
);
}

Ok(())
}

Expand Down Expand Up @@ -358,6 +390,9 @@

fdt.property_string("compatible", "virtio,mmio")?;
fdt.property_array_u64("reg", &[dev_info.addr, dev_info.len])?;
// Force all virtio device to place their vrings and buffer into the dedicated I/O memory region
// that is backed by traditional memory (e.g. not secret-free).
fdt.property_u32("memory-region", IO_MEM_PHANDLE)?;
fdt.property_array_u32(
"interrupts",
&[
Expand Down Expand Up @@ -499,6 +534,7 @@
let gic = create_gic(&vm, 1, None).unwrap();
create_fdt(
&mem,
None,
vec![0],
CString::new("console=tty0").unwrap(),
&dev_info,
Expand All @@ -519,6 +555,7 @@
let gic = create_gic(&vm, 1, None).unwrap();
create_fdt(
&mem,
None,
vec![0],
CString::new("console=tty0").unwrap(),
&HashMap::<(DeviceType, std::string::String), MMIODeviceInfo>::new(),
Expand All @@ -544,6 +581,7 @@

let current_dtb_bytes = create_fdt(
&mem,
None,
vec![0],
CString::new("console=tty0").unwrap(),
&HashMap::<(DeviceType, std::string::String), MMIODeviceInfo>::new(),
Expand Down Expand Up @@ -606,6 +644,7 @@

let current_dtb_bytes = create_fdt(
&mem,
None,
vec![0],
CString::new("console=tty0").unwrap(),
&HashMap::<(DeviceType, std::string::String), MMIODeviceInfo>::new(),
Expand Down
4 changes: 0 additions & 4 deletions src/vmm/src/arch/aarch64/kvm.rs
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,6 @@ pub struct OptionalCapabilities {
pub struct Kvm {
/// KVM fd.
pub fd: KvmFd,
/// Maximum number of memory slots allowed by KVM.
pub max_memslots: usize,
/// Additional capabilities that were specified in cpu template.
pub kvm_cap_modifiers: Vec<KvmCapability>,
}
Expand All @@ -42,12 +40,10 @@ impl Kvm {
/// Initialize [`Kvm`] type for Aarch64 architecture
pub fn init_arch(
fd: KvmFd,
max_memslots: usize,
kvm_cap_modifiers: Vec<KvmCapability>,
) -> Result<Self, KvmArchError> {
Ok(Self {
fd,
max_memslots,
kvm_cap_modifiers,
})
}
Expand Down
39 changes: 30 additions & 9 deletions src/vmm/src/arch/aarch64/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -58,9 +58,20 @@

/// Returns a Vec of the valid memory addresses for aarch64.
/// See [`layout`](layout) module for a drawing of the specific memory model for this platform.
pub fn arch_memory_regions(size: usize) -> Vec<(GuestAddress, usize)> {
let dram_size = min(size, layout::DRAM_MEM_MAX_SIZE);
vec![(GuestAddress(layout::DRAM_MEM_START), dram_size)]
pub fn arch_memory_regions(offset: usize, size: usize) -> Vec<(GuestAddress, usize)> {
let dram_size = min(size, layout::DRAM_MEM_MAX_SIZE - offset);
vec![(
GuestAddress(layout::DRAM_MEM_START + offset as u64),
dram_size,
)]
}

/// How many bytes of physical guest memory are addressible before the final gap in
/// the address space on this architecture.
///
/// There are no architectural gaps in the physical address space on aarch64, so this is 0
pub fn bytes_before_last_gap() -> usize {
0

Check warning on line 74 in src/vmm/src/arch/aarch64/mod.rs

View check run for this annotation

Codecov / codecov/patch

src/vmm/src/arch/aarch64/mod.rs#L73-L74

Added lines #L73 - L74 were not covered by tests
}

/// Configures the system for booting Linux.
Expand Down Expand Up @@ -89,7 +100,7 @@
// Configure vCPUs with normalizing and setting the generated CPU configuration.
for vcpu in vcpus.iter_mut() {
vcpu.kvm_vcpu.configure(
&vmm.guest_memory,
vmm.vm.guest_memory(),
entry_point,
&vcpu_config,
&optional_capabilities,
Expand All @@ -103,8 +114,16 @@
.as_cstring()
.expect("Cannot create cstring from cmdline string");

let swiotlb_region = match vmm.vm.swiotlb_regions().num_regions() {
0 | 1 => vmm.vm.swiotlb_regions().iter().next(),
_ => panic!(
"Firecracker tried to configure more than one swiotlb region. This is a logic bug."
),

Check warning on line 121 in src/vmm/src/arch/aarch64/mod.rs

View check run for this annotation

Codecov / codecov/patch

src/vmm/src/arch/aarch64/mod.rs#L119-L121

Added lines #L119 - L121 were not covered by tests
};

let fdt = fdt::create_fdt(
&vmm.guest_memory,
vmm.vm.guest_memory(),
swiotlb_region,
vcpu_mpidr,
cmdline,
vmm.mmio_device_manager.get_device_info(),
Expand All @@ -113,8 +132,10 @@
initrd,
)?;

let fdt_address = GuestAddress(get_fdt_addr(&vmm.guest_memory));
vmm.guest_memory.write_slice(fdt.as_slice(), fdt_address)?;
let fdt_address = GuestAddress(get_fdt_addr(vmm.vm.guest_memory()));
vmm.vm
.guest_memory()
.write_slice(fdt.as_slice(), fdt_address)?;

Ok(())
}
Expand Down Expand Up @@ -188,15 +209,15 @@

#[test]
fn test_regions_lt_1024gb() {
let regions = arch_memory_regions(1usize << 29);
let regions = arch_memory_regions(0, 1usize << 29);
assert_eq!(1, regions.len());
assert_eq!(GuestAddress(super::layout::DRAM_MEM_START), regions[0].0);
assert_eq!(1usize << 29, regions[0].1);
}

#[test]
fn test_regions_gt_1024gb() {
let regions = arch_memory_regions(1usize << 41);
let regions = arch_memory_regions(0, 1usize << 41);
assert_eq!(1, regions.len());
assert_eq!(GuestAddress(super::layout::DRAM_MEM_START), regions[0].0);
assert_eq!(super::layout::DRAM_MEM_MAX_SIZE, regions[0].1);
Expand Down
Loading
Loading