Skip to content

Conversation

@mkroening
Copy link
Member

@mkroening mkroening commented Nov 3, 2025

This PR makes use of the offset + wrap bitfield for saving one byte per index. I am not sure if there is any performance change, but this also makes the code more readable by combining these related fields.

Depends on #2046.

@mkroening mkroening self-assigned this Nov 3, 2025
@mkroening mkroening changed the title perf(pvirtq): use bitfields for RingIdx perf(pvirtq): combine u15 offset and u1 wrap counter in more places Nov 3, 2025
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark Results

Benchmark Current: ea294d7 Previous: b966ab4 Performance Ratio
startup_benchmark Build Time 115.06 s 111.96 s 1.03
startup_benchmark File Size 0.91 MB 0.91 MB 1.00
Startup Time - 1 core 0.94 s (±0.03 s) 0.93 s (±0.02 s) 1.00
Startup Time - 2 cores 0.92 s (±0.03 s) 0.93 s (±0.03 s) 0.98
Startup Time - 4 cores 0.92 s (±0.03 s) 0.93 s (±0.03 s) 0.99
multithreaded_benchmark Build Time 114.86 s 114.77 s 1.00
multithreaded_benchmark File Size 1.02 MB 1.01 MB 1.00
Multithreaded Pi Efficiency - 2 Threads 89.00 % (±7.16 %) 88.97 % (±7.38 %) 1.00
Multithreaded Pi Efficiency - 4 Threads 43.56 % (±3.34 %) 44.09 % (±3.63 %) 0.99
Multithreaded Pi Efficiency - 8 Threads 25.73 % (±2.14 %) 25.36 % (±1.99 %) 1.01
micro_benchmarks Build Time 296.36 s 295.64 s 1.00
micro_benchmarks File Size 1.02 MB 1.02 MB 1.00
Scheduling time - 1 thread 174.22 ticks (±22.97 ticks) 178.23 ticks (±25.33 ticks) 0.98
Scheduling time - 2 threads 107.05 ticks (±18.69 ticks) 98.66 ticks (±14.00 ticks) 1.09
Micro - Time for syscall (getpid) 12.02 ticks (±10.15 ticks) 11.98 ticks (±4.92 ticks) 1.00
Memcpy speed - (built_in) block size 4096 61488.74 MByte/s (±43301.44 MByte/s) 63152.99 MByte/s (±45587.10 MByte/s) 0.97
Memcpy speed - (built_in) block size 1048576 14260.67 MByte/s (±11591.85 MByte/s) 15668.33 MByte/s (±12904.18 MByte/s) 0.91
Memcpy speed - (built_in) block size 16777216 9873.42 MByte/s (±7962.15 MByte/s) 10465.02 MByte/s (±8568.62 MByte/s) 0.94
Memset speed - (built_in) block size 4096 61869.80 MByte/s (±43575.40 MByte/s) 63370.25 MByte/s (±45752.42 MByte/s) 0.98
Memset speed - (built_in) block size 1048576 14559.60 MByte/s (±11773.86 MByte/s) 15907.38 MByte/s (±12989.07 MByte/s) 0.92
Memset speed - (built_in) block size 16777216 10124.27 MByte/s (±8112.18 MByte/s) 10679.75 MByte/s (±8675.34 MByte/s) 0.95
Memcpy speed - (rust) block size 4096 53703.78 MByte/s (±40172.73 MByte/s) 56435.01 MByte/s (±41692.91 MByte/s) 0.95
Memcpy speed - (rust) block size 1048576 13436.59 MByte/s (±11026.86 MByte/s) 14141.32 MByte/s (±11788.91 MByte/s) 0.95
Memcpy speed - (rust) block size 16777216 9647.20 MByte/s (±7759.02 MByte/s) 10587.80 MByte/s (±8680.27 MByte/s) 0.91
Memset speed - (rust) block size 4096 54256.70 MByte/s (±40611.12 MByte/s) 57311.29 MByte/s (±42263.04 MByte/s) 0.95
Memset speed - (rust) block size 1048576 13780.27 MByte/s (±11229.11 MByte/s) 14556.89 MByte/s (±12036.69 MByte/s) 0.95
Memset speed - (rust) block size 16777216 9879.64 MByte/s (±7892.83 MByte/s) 10801.86 MByte/s (±8787.27 MByte/s) 0.91
alloc_benchmarks Build Time 292.53 s 295.50 s 0.99
alloc_benchmarks File Size 0.98 MB 0.98 MB 1.00
Allocations - Allocation success 100.00 % 100.00 % 1
Allocations - Deallocation success 100.00 % 100.00 % 1
Allocations - Pre-fail Allocations 100.00 % 100.00 % 1
Allocations - Average Allocation time 22711.26 Ticks (±1136.06 Ticks) 22283.43 Ticks (±1951.06 Ticks) 1.02
Allocations - Average Allocation time (no fail) 22711.26 Ticks (±1136.06 Ticks) 22283.43 Ticks (±1951.06 Ticks) 1.02
Allocations - Average Deallocation time 2704.44 Ticks (±1094.24 Ticks) 2877.16 Ticks (±1320.24 Ticks) 0.94
mutex_benchmark Build Time 292.65 s 295.15 s 0.99
mutex_benchmark File Size 1.02 MB 1.02 MB 1.00
Mutex Stress Test Average Time per Iteration - 1 Threads 37.50 ns (±3.83 ns) 37.34 ns (±4.31 ns) 1.00
Mutex Stress Test Average Time per Iteration - 2 Threads 31.22 ns (±3.11 ns) 30.54 ns (±3.08 ns) 1.02

This comment was automatically generated by workflow using github-action-benchmark.

@mkroening mkroening marked this pull request as ready for review November 4, 2025 08:54
Copy link
Member

@Gelbpunkt Gelbpunkt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about perf here since bitfields should have generally worse access times due to the necessary bitshifts as opposed to doing them once and using an intermediate struct for holding the values. I don't really mind either way, but this version feels like we're at least not reinventing virtio-spec

@mkroening mkroening changed the title perf(pvirtq): combine u15 offset and u1 wrap counter in more places refactor(pvirtq): combine u15 offset and u1 wrap counter in more places Nov 17, 2025
@mkroening
Copy link
Member Author

Makes sense. I renamed the commits to imply only a refactor. Merging this once CI passes.

@mkroening mkroening added this pull request to the merge queue Nov 19, 2025
Merged via the queue into main with commit 700ef19 Nov 19, 2025
26 of 32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants