[uservm] E: Linux KVM guest sampling and perf host correlation#17
Open
esaurez wants to merge 23 commits into
Open
[uservm] E: Linux KVM guest sampling and perf host correlation#17esaurez wants to merge 23 commits into
esaurez wants to merge 23 commits into
Conversation
514b3db to
1dd5d0e
Compare
9f1eab5 to
512deb5
Compare
Extend the mmap kernel call to accept an npages parameter that specifies how many contiguous pages to map in a single invocation. This replaces the previous page-at-a-time approach with a batched allocation strategy that reduces kernel transitions from user space. Kernel changes: - Extend kcall dispatcher to pass arg3 (npages) to the mmap handler. - Add input validation in the kcall entry point: reject zero npages, cap to mmap region capacity, and guard against address overflow. - Rewrite ProcessManager::mmap() with a batched loop that uses try_reserve_exact to guarantee Vec capacity matches the intended page count (alloc_upages derives its count from uframes.capacity()). - Implement a fallback chain: try batch size first, fall back to single-page allocation, then return OutOfMemory. - Add rollback_mmap() helper to unmap partially mapped pages on failure. User-space changes: - Update the mmap kcall wrapper to use kcall4! and accept npages, with u32 range validation. - Replace the page-by-page loop in sysalloc::heap::map_range() with a single batch mmap call, delegating rollback to the kernel. - Update syscall::safe::mem::segment::map_range() to pass npages=1 (batch conversion deferred via TODO). - Update all existing callers (stress tests, testd) to pass npages=1. Testing: - Add test_mmap_multi_page() in testd that maps 4 pages in one call, verifies zero-initialization, writes and reads back per-page markers, and unmaps all pages.
[kernel] Add npages parameter to mmap kcall
Add a periodic guest stack sampling profiler that runs from the host side and produces folded stack files compatible with flamegraph tools. The profiler works by: 1. Starting a 1kHz timer that calls WHvCancelRunVirtualProcessor 2. On each Interrupted exit, reading guest EIP/EBP/CR3 registers 3. Walking the frame-pointer chain through host-mapped guest memory 4. Resolving addresses against ELF symbol tables (.symtab or .dynsym) 5. Writing folded stacks to a file for flamegraph generation Key components: - guest_profiler/gva.rs: Guest VA to GPA translation (2-level page walk) - guest_profiler/samples.rs: Stack sampling and frame-pointer walking - guest_profiler/symbols.rs: ELF32 symbol table parser with .symtab/.dynsym - vmm/microvm/whp/mod.rs: Timer integration and sampling on Interrupted exits - vcpu/mod.rs: get_profile_regs() for reading EIP/EBP/CR3 Also adds a release-profiling Cargo profile for building with symbols. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
[uservm] E: Host kernel stack correlation via ETW (Windows)
Bumps [Mozilla-Actions/sccache-action](https://github.com/mozilla-actions/sccache-action) from 0.0.9 to 0.0.10. - [Release notes](https://github.com/mozilla-actions/sccache-action/releases) - [Commits](Mozilla-Actions/sccache-action@v0.0.9...v0.0.10) --- updated-dependencies: - dependency-name: Mozilla-Actions/sccache-action dependency-version: 0.0.10 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>
…Mozilla-Actions-sccache-action-0.0.10 Bump Mozilla-Actions/sccache-action from 0.0.9 to 0.0.10
Add host-mount.md describing the snapshot-based mounting feature that allows users to mount a host directory into a Nanvix guest VM via a new `-mount <host-dir>` CLI flag for nanvixd. The document covers: - Problem statement and user interface (CLI usage) - Goals and non-goals (e.g., not replacing linuxd) - High-level snapshot-based design: host dir is packaged into a FAT32 image at launch, mapped via MMIO, and extracted back at shutdown - Multi-image binary container format (MIMG) that supports packing multiple sub-images (ROOTFS + MOUNTFS) into a single MMIO region - Size constraints, edge cases, and backward compatibility strategy - Known limitations (copy overhead, 16 MiB ceiling, no live sync)
Introduce the `-mount <host-dir>` option for standalone mode, enabling bidirectional file sharing between the host and the guest VM. A host directory is packaged into a FAT32 image, mapped into guest memory at /mnt via a new multi-image container format, and copied back to the host on VM shutdown. Multi-image container format (multiimage crate): - Define a binary format with a HEAD page (header + entry table) followed by page-aligned sub-images, each identified by an 8-byte MMIO tag. - Support both building (std) and parsing (no_std) paths so the same format works on host and guest. - Provide compute_multiimage_layout() for zero-copy VMM integration (no intermediate concatenated file on disk). Guest-side changes (nvx crate): - Detect multi-image containers in the RAMFS MMIO region during VFS initialization. - Mount ROOTFS at "/" and MOUNTFS at "/mnt" when sub-images are present; fall back to the legacy single-image path otherwise. VMM backend changes: - Add mount module (src/uservm/src/vmm/microvm/mount.rs) with helpers to build FAT images from host directories, compute unified layouts, and extract modified files after shutdown (copyback). - Add MultiRamFs to the ramfs module for loading multi-image containers with per-file zero-copy mappings into guest memory. - KVM: add remap_files_at() and attach_backing_files() to VirtualMemory. - WHP: refactor placeholder splitting to support multiple file-backed views (MultiFileRemap), including gap re-commit and proper cleanup in Drop. mkramfs library extensions: - Export compute_image_size(), generate_image(), dir_size(), and copy_dir_recursive() as public APIs. - Change mkfatfs() to return Result instead of panicking. CLI and plumbing: - Add -mount option to nanvixd argument parser with directory validation. - Thread mount_directory through UserVmArgs, MicroVmArgs, StandaloneConfig, TerminalConfig, and all sandbox/benchmark call sites. - Add ROOTFS_MMIO_TAG and MOUNTFS_MMIO_TAG to config::region_tags. - Add MmioTag::as_bytes() accessor to mmio-tag crate. Tests and benchmarks: - Add mount-test guest binary: verifies read, nested read, create, and modify-then-re-read on /mnt. - Add mount-bench-nostd guest binary: measures sequential 4 KiB read, write, and file creation latency on /mnt. - Add test-standalone.toml entry for mount-test. - Build system: register new crates, add to standalone guest binary list, and seed test/bench data directories in the Makefile.
[uservm] Add host directory mounting on guest
- perf_linux.rs: PerfSession with system-wide perf (-a) for kernel stacks, per-PID fallback. Waits for perf_event fds, detects early perf exit. chmod 644 on perf.data after stop for script accessibility. - Re-export PerfSession as HostKernelSession (matches Windows type alias) - KVM sampling: SIGUSR2 timer, Interrupted continues when profiler active - full-flamegraph.sh: perf script -F, grep nanvixd|uservm filtering - Safe register reads, freq_hz clamped 1-10kHz Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
512deb5 to
c3f64d3
Compare
Add Linux KVM guest profiler with perf host correlation: - perf_linux.rs: PerfSession with named constants, system-wide fallback, readiness wait, chmod - mod.rs: SIGUSR2 profiler timer extracted to spawn_profiler_timer(), error capture on join - flamegraph_host.py: Linux perf extraction with returncode checks for perf script and collapse - flamegraph.py: add PERF_SESSION to stderr filter - Remove full-flamegraph.sh (replaced by flamegraph.py) - Downgrade EINTR log from warn to trace (profiler generates up to 10kHz interrupts) - Fix perf_linux.rs docs: no timestamp correlation - Fix start() docstring: describes actual behavior - Fix mod.rs comment: Linux uses PerfSession not stub - Document Relaxed ordering safety in timer shutdown Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
c3f64d3 to
8056cc0
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Replica of nanvix#2168.
Summary
Adds Linux KVM guest sampling and perf record-based host kernel stack correlation, completing the cross-platform profiler started in PR #9 (Windows ETW). Plugs into the HostKernelSession type alias introduced there.
What this PR adds
KVM guest sampling (vmm/microvm/mod.rs)
Host kernel tracing (perf_linux.rs)
E2E script (full-flamegraph.sh)
Ungated register access (kvm/vcpu/mod.rs)
Stacked on