Skip to content

[vmm][whp] E: Batch RIP+RAX register writes for PMIO reads#12

Draft
esaurez wants to merge 3 commits into
devfrom
enhancement-vmm-rip-rax-batching
Draft

[vmm][whp] E: Batch RIP+RAX register writes for PMIO reads#12
esaurez wants to merge 3 commits into
devfrom
enhancement-vmm-rip-rax-batching

Conversation

@esaurez

@esaurez esaurez commented Apr 22, 2026

Copy link
Copy Markdown
Owner

Replica of nanvix#1935.

For PMIO read exits, the VMM previously made two separate WHvSetVirtualProcessorRegisters calls: one to advance RIP and another to set RAX. This change defers the RIP advance for PMIO reads and batches it with the RAX write into a single call, halving the number of hypervisor round-trips per read exit.

Benchmark (p50, 50 iterations, cold-start):

Metric Before After Change
Total 61,569 us 60,814 us -1.2%
Exits 65 65 --

Depends on #11 (CPUID frequency override).

ppenna and others added 3 commits April 7, 2026 08:38
- Reduce PIT-based calibration window from 10 ms to 1 ms.
- Replace PIT-based calibration with RDTSC spin loop when CPUID
  leaf 0x16 is available, eliminating ~100 VM exits during boot.
- Fall back to PIT-based calibration (1 ms window) when leaf 0x16
  is unavailable instead of guessing TSC frequency.
- Add max-iteration guard to RDTSC spin loop to prevent hangs.
- Fix CPUID helper to check max supported leaf and set ECX=0
  explicitly for well-defined subleaf selection.
Hyper-V zeros out CPUID leaf 0x16 (Processor Frequency Information)
even on CPUs that support it (e.g. Skylake-SP). This prevents the
guest kernel from using RDTSC-based LAPIC timer calibration, forcing
a fallback to the PIT busy-wait loop which generates ~908 VM exits.

Query the host TSC frequency via WHvGetCapability(ProcessorClockFrequency)
and inject it into CPUID leaf 0x16 EAX via WHvPartitionPropertyCodeCpuidResultList
before partition setup. The guest kernel (on the LAPIC calibration branch)
sees a non-zero base frequency and uses an RDTSC spin loop instead of PIT,
eliminating ~908 PMIO read exits.

Benchmark results (p50, 50 iterations):
  Before (dev):       81,106 us total, 59,384 us guest_exec
  After  (this fix):  67,029 us total, 48,641 us guest_exec
  Improvement:        17.4% total, 18.1% guest_exec

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
For PMIO read exits, the VMM previously made two separate
WHvSetVirtualProcessorRegisters calls: one to advance RIP (in the
vCPU exit handler) and another to set RAX (in the run loop).

This change defers the RIP advance for PMIO reads and batches it
with the RAX write into a single WHvSetVirtualProcessorRegisters
call, halving the number of hypervisor round-trips per read exit.

Benchmark (p50, 50 iterations, cold-start):
  Before: 61,569 us total, 65 exits
  After:  60,814 us total, 65 exits (~1.2% improvement)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants