Skip to content

Conversation

zyuiop
Copy link
Contributor

@zyuiop zyuiop commented Sep 17, 2025

Fixes an old FIXME.

A very quick wrk benchmark on the axum example seems to report a 50% performance improvement (both tests were made with #1939 applied, the effect may be different without this patch).

Before:

$ wrk http://localhost:8080/
Running 10s test @ http://localhost:8080/
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    19.91ms   58.94ms 716.76ms   92.84%
    Req/Sec     1.10k   342.95     2.09k    72.50%
  21844 requests in 10.00s, 3.37MB read
Requests/sec:   2184.33
Transfer/sec:    345.57KB

After:

$ wrk http://localhost:8080/
Running 10s test @ http://localhost:8080/
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    12.06ms   50.31ms 782.84ms   95.71%
    Req/Sec     1.69k   475.75     3.68k    79.00%
  33619 requests in 10.00s, 5.19MB read
Requests/sec:   3361.60
Transfer/sec:    531.81KB

@zyuiop
Copy link
Contributor Author

zyuiop commented Sep 17, 2025

This does not seem to work in all cases

edit: actually maybe it does, but then it requires the no-pre-emptive thing

@zyuiop zyuiop marked this pull request as draft September 17, 2025 14:34
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmark Results

Benchmark Current: 4f5eba0 Previous: 40e0c6e Performance Ratio
startup_benchmark Build Time 135.62 s 136.14 s 1.00
startup_benchmark File Size 0.90 MB 0.90 MB 1.00
Startup Time - 1 core 0.94 s (±0.03 s) 0.94 s (±0.02 s) 1.00
Startup Time - 2 cores 0.93 s (±0.02 s) 0.92 s (±0.03 s) 1.01
Startup Time - 4 cores 0.94 s (±0.02 s) 0.96 s (±0.03 s) 0.98
multithreaded_benchmark Build Time 133.26 s 140.97 s 0.95
multithreaded_benchmark File Size 1.01 MB 1.01 MB 1.00
Multithreaded Pi Efficiency - 2 Threads 2.52 % (±12.10 %) 2.17 % (±10.40 %) 1.16
Multithreaded Pi Efficiency - 4 Threads 1.59 % (±7.62 %) 1.51 % (±7.26 %) 1.05
Multithreaded Pi Efficiency - 8 Threads 0.73 % (±3.48 %) 0.77 % (±3.68 %) 0.95
micro_benchmarks Build Time 164.36 s 171.42 s 0.96
micro_benchmarks File Size 1.01 MB 1.01 MB 1.00
Scheduling time - 1 thread 3.31 ticks (±15.90 ticks) 2.77 ticks (±13.29 ticks) 1.20
Scheduling time - 2 threads 1.78 ticks (±8.55 ticks) 1.75 ticks (±8.39 ticks) 1.02
Micro - Time for syscall (getpid) 0.20 ticks (±0.96 ticks) 0.12 ticks (±0.58 ticks) 1.65
Memcpy speed - (built_in) block size 4096 1474.06 MByte/s (±7075.47 MByte/s) 1816.86 MByte/s (±8720.93 MByte/s) 0.81
Memcpy speed - (built_in) block size 1048576 557.11 MByte/s (±2674.13 MByte/s) 745.11 MByte/s (±3576.55 MByte/s) 0.75
Memcpy speed - (built_in) block size 16777216 205.90 MByte/s (±988.34 MByte/s) 219.45 MByte/s (±1053.36 MByte/s) 0.94
Memset speed - (built_in) block size 4096 991.74 MByte/s (±4760.33 MByte/s) 1875.00 MByte/s (±9000.00 MByte/s) 0.53
Memset speed - (built_in) block size 1048576 1291.16 MByte/s (±6197.55 MByte/s) 1029.20 MByte/s (±4940.18 MByte/s) 1.25
Memset speed - (built_in) block size 16777216 901.25 MByte/s (±4325.99 MByte/s) 924.64 MByte/s (±4438.25 MByte/s) 0.97
Memcpy speed - (rust) block size 4096 1363.64 MByte/s (±6545.45 MByte/s) 1411.76 MByte/s (±6776.47 MByte/s) 0.97
Memcpy speed - (rust) block size 1048576 753.96 MByte/s (±3619.02 MByte/s) 693.90 MByte/s (±3330.73 MByte/s) 1.09
Memcpy speed - (rust) block size 16777216 214.88 MByte/s (±1031.42 MByte/s) 219.67 MByte/s (±1054.40 MByte/s) 0.98
Memset speed - (rust) block size 4096 1791.04 MByte/s (±8597.01 MByte/s) 1791.04 MByte/s (±8597.01 MByte/s) 1
Memset speed - (rust) block size 1048576 1135.46 MByte/s (±5450.21 MByte/s) 1105.00 MByte/s (±5304.01 MByte/s) 1.03
Memset speed - (rust) block size 16777216 918.31 MByte/s (±4407.90 MByte/s) 954.96 MByte/s (±4583.81 MByte/s) 0.96
alloc_benchmarks Build Time 162.68 s 157.27 s 1.03
alloc_benchmarks File Size 0.97 MB 0.97 MB 1.00
Allocations - Allocation success 2.00 % (±13.86 %) 2.00 % (±13.86 %) 1
Allocations - Deallocation success 1.40 % (±9.69 %) 1.40 % (±9.67 %) 1.00
Allocations - Pre-fail Allocations 2.00 % (±13.86 %) 2.00 % (±13.86 %) 1
Allocations - Average Allocation time 259.34 Ticks (±1797.16 Ticks) 262.62 Ticks (±1819.84 Ticks) 0.99
Allocations - Average Allocation time (no fail) 259.34 Ticks (±1797.16 Ticks) 262.62 Ticks (±1819.84 Ticks) 0.99
Allocations - Average Deallocation time 17.12 Ticks (±118.65 Ticks) 17.05 Ticks (±118.18 Ticks) 1.00
mutex_benchmark Build Time 163.32 s 159.65 s 1.02
mutex_benchmark File Size 1.01 MB 1.01 MB 1.00
Mutex Stress Test Average Time per Iteration - 1 Threads 0.32 ns (±2.22 ns) 0.36 ns (±2.49 ns) 0.89
Mutex Stress Test Average Time per Iteration - 2 Threads 0.36 ns (±2.49 ns) 0.38 ns (±2.63 ns) 0.95

This comment was automatically generated by workflow using github-action-benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant