refactor(pvirtq): combine `u15` offset and `u1` wrap counter in more places #2047

mkroening · 2025-11-03T17:20:34Z

This PR makes use of the offset + wrap bitfield for saving one byte per index. I am not sure if there is any performance change, but this also makes the code more readable by combining these related fields.

Depends on #2046.

github-actions

Benchmark Results

Benchmark	Current: `ea294d7`	Previous: `b966ab4`	Performance Ratio
startup_benchmark Build Time	`115.06` s	`111.96` s	`1.03`
startup_benchmark File Size	`0.91` MB	`0.91` MB	`1.00`
Startup Time - 1 core	`0.94` s (`±0.03` s)	`0.93` s (`±0.02` s)	`1.00`
Startup Time - 2 cores	`0.92` s (`±0.03` s)	`0.93` s (`±0.03` s)	`0.98`
Startup Time - 4 cores	`0.92` s (`±0.03` s)	`0.93` s (`±0.03` s)	`0.99`
multithreaded_benchmark Build Time	`114.86` s	`114.77` s	`1.00`
multithreaded_benchmark File Size	`1.02` MB	`1.01` MB	`1.00`
Multithreaded Pi Efficiency - 2 Threads	`89.00` % (`±7.16` %)	`88.97` % (`±7.38` %)	`1.00`
Multithreaded Pi Efficiency - 4 Threads	`43.56` % (`±3.34` %)	`44.09` % (`±3.63` %)	`0.99`
Multithreaded Pi Efficiency - 8 Threads	`25.73` % (`±2.14` %)	`25.36` % (`±1.99` %)	`1.01`
micro_benchmarks Build Time	`296.36` s	`295.64` s	`1.00`
micro_benchmarks File Size	`1.02` MB	`1.02` MB	`1.00`
Scheduling time - 1 thread	`174.22` ticks (`±22.97` ticks)	`178.23` ticks (`±25.33` ticks)	`0.98`
Scheduling time - 2 threads	`107.05` ticks (`±18.69` ticks)	`98.66` ticks (`±14.00` ticks)	`1.09`
Micro - Time for syscall (getpid)	`12.02` ticks (`±10.15` ticks)	`11.98` ticks (`±4.92` ticks)	`1.00`
Memcpy speed - (built_in) block size 4096	`61488.74` MByte/s (`±43301.44` MByte/s)	`63152.99` MByte/s (`±45587.10` MByte/s)	`0.97`
Memcpy speed - (built_in) block size 1048576	`14260.67` MByte/s (`±11591.85` MByte/s)	`15668.33` MByte/s (`±12904.18` MByte/s)	`0.91`
Memcpy speed - (built_in) block size 16777216	`9873.42` MByte/s (`±7962.15` MByte/s)	`10465.02` MByte/s (`±8568.62` MByte/s)	`0.94`
Memset speed - (built_in) block size 4096	`61869.80` MByte/s (`±43575.40` MByte/s)	`63370.25` MByte/s (`±45752.42` MByte/s)	`0.98`
Memset speed - (built_in) block size 1048576	`14559.60` MByte/s (`±11773.86` MByte/s)	`15907.38` MByte/s (`±12989.07` MByte/s)	`0.92`
Memset speed - (built_in) block size 16777216	`10124.27` MByte/s (`±8112.18` MByte/s)	`10679.75` MByte/s (`±8675.34` MByte/s)	`0.95`
Memcpy speed - (rust) block size 4096	`53703.78` MByte/s (`±40172.73` MByte/s)	`56435.01` MByte/s (`±41692.91` MByte/s)	`0.95`
Memcpy speed - (rust) block size 1048576	`13436.59` MByte/s (`±11026.86` MByte/s)	`14141.32` MByte/s (`±11788.91` MByte/s)	`0.95`
Memcpy speed - (rust) block size 16777216	`9647.20` MByte/s (`±7759.02` MByte/s)	`10587.80` MByte/s (`±8680.27` MByte/s)	`0.91`
Memset speed - (rust) block size 4096	`54256.70` MByte/s (`±40611.12` MByte/s)	`57311.29` MByte/s (`±42263.04` MByte/s)	`0.95`
Memset speed - (rust) block size 1048576	`13780.27` MByte/s (`±11229.11` MByte/s)	`14556.89` MByte/s (`±12036.69` MByte/s)	`0.95`
Memset speed - (rust) block size 16777216	`9879.64` MByte/s (`±7892.83` MByte/s)	`10801.86` MByte/s (`±8787.27` MByte/s)	`0.91`
alloc_benchmarks Build Time	`292.53` s	`295.50` s	`0.99`
alloc_benchmarks File Size	`0.98` MB	`0.98` MB	`1.00`
Allocations - Allocation success	`100.00` %	`100.00` %	`1`
Allocations - Deallocation success	`100.00` %	`100.00` %	`1`
Allocations - Pre-fail Allocations	`100.00` %	`100.00` %	`1`
Allocations - Average Allocation time	`22711.26` Ticks (`±1136.06` Ticks)	`22283.43` Ticks (`±1951.06` Ticks)	`1.02`
Allocations - Average Allocation time (no fail)	`22711.26` Ticks (`±1136.06` Ticks)	`22283.43` Ticks (`±1951.06` Ticks)	`1.02`
Allocations - Average Deallocation time	`2704.44` Ticks (`±1094.24` Ticks)	`2877.16` Ticks (`±1320.24` Ticks)	`0.94`
mutex_benchmark Build Time	`292.65` s	`295.15` s	`0.99`
mutex_benchmark File Size	`1.02` MB	`1.02` MB	`1.00`
Mutex Stress Test Average Time per Iteration - 1 Threads	`37.50` ns (`±3.83` ns)	`37.34` ns (`±4.31` ns)	`1.00`
Mutex Stress Test Average Time per Iteration - 2 Threads	`31.22` ns (`±3.11` ns)	`30.54` ns (`±3.08` ns)	`1.02`

This comment was automatically generated by workflow using github-action-benchmark.

Gelbpunkt

Not sure about perf here since bitfields should have generally worse access times due to the necessary bitshifts as opposed to doing them once and using an intermediate struct for holding the values. I don't really mind either way, but this version feels like we're at least not reinventing virtio-spec

mkroening · 2025-11-17T14:37:38Z

Makes sense. I renamed the commits to imply only a refactor. Merging this once CI passes.

… `drv_wc`

…`dev_wc`

mkroening self-assigned this Nov 3, 2025

mkroening force-pushed the pvirtq-ring-idx branch from 360b927 to a89d96c Compare November 3, 2025 17:39

mkroening changed the title ~~perf(pvirtq): use bitfields for RingIdx~~ perf(pvirtq): combine u15 offset and u1 wrap counter in more places Nov 3, 2025

github-actions bot reviewed Nov 3, 2025

View reviewed changes

mkroening marked this pull request as ready for review November 4, 2025 08:54

mkroening requested review from Gelbpunkt and cagatay-y November 4, 2025 08:54

Gelbpunkt approved these changes Nov 6, 2025

View reviewed changes

mkroening force-pushed the pvirtq-ring-idx branch from 96ca7e8 to b20103e Compare November 6, 2025 16:39

mkroening force-pushed the pvirtq-ring-idx branch from 57f95d4 to cf8bb5b Compare November 17, 2025 14:36

mkroening changed the title ~~perf(pvirtq): combine u15 offset and u1 wrap counter in more places~~ refactor(pvirtq): combine u15 offset and u1 wrap counter in more places Nov 17, 2025

mkroening added 3 commits November 18, 2025 16:44

refactor(pvirtq): use bitfields for RingIdx

1fe4b06

refactor(pvirtq): use bitfields for DescriptorRing::write_index and…

a1edc58

… `drv_wc`

refactor(pvirtq): use bitfields for DescriptorRing::poll_index and …

ea294d7

…`dev_wc`

mkroening force-pushed the pvirtq-ring-idx branch from 133aa88 to 1fe4b06 Compare November 18, 2025 15:45

mkroening added this pull request to the merge queue Nov 19, 2025

Merged via the queue into main with commit 700ef19 Nov 19, 2025
26 of 32 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor(pvirtq): combine `u15` offset and `u1` wrap counter in more places #2047

refactor(pvirtq): combine `u15` offset and `u1` wrap counter in more places #2047

mkroening commented Nov 3, 2025 •

edited

Loading

Uh oh!

github-actions bot left a comment •

edited

Loading

Uh oh!

Gelbpunkt left a comment

Uh oh!

mkroening commented Nov 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

refactor(pvirtq): combine u15 offset and u1 wrap counter in more places #2047

refactor(pvirtq): combine u15 offset and u1 wrap counter in more places #2047

Conversation

mkroening commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Benchmark Results

Uh oh!

Gelbpunkt left a comment

Choose a reason for hiding this comment

Uh oh!

mkroening commented Nov 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

refactor(pvirtq): combine `u15` offset and `u1` wrap counter in more places #2047

refactor(pvirtq): combine `u15` offset and `u1` wrap counter in more places #2047

mkroening commented Nov 3, 2025 •

edited

Loading

github-actions bot left a comment •

edited

Loading