Skip to content

Conversation

dhardy
Copy link
Member

@dhardy dhardy commented Sep 10, 2025

Summary

Source moved: rust-random/rngs#68

@dhardy
Copy link
Member Author

dhardy commented Sep 10, 2025

A quick benchmark run shows no very significant changes (the bar of "significant" being fairly high due to unpinned CPU freq.).

Benchmark
$ cargo bench -- chacha
   Compiling rand_pcg v0.9.0 (/home/dhardy/projects/rand/rand/rand_pcg)
   Compiling rand v0.9.2 (/home/dhardy/projects/rand/rand)
   Compiling benches v0.1.0 (/home/dhardy/projects/rand/rand/benches)
    Finished `bench` profile [optimized] target(s) in 5.97s
     Running benches/array.rs (/home/dhardy-extra/.cache/cargo-build/release/deps/array-af84aee0fd253b08)
     Running benches/bool.rs (/home/dhardy-extra/.cache/cargo-build/release/deps/bool-0b6ead89b6d1c550)
     Running benches/generators.rs (/home/dhardy-extra/.cache/cargo-build/release/deps/generators-827c3b4b67548787)
random_bytes/chacha8    time:   [245.77 ns 246.39 ns 246.93 ns]
                        thrpt:  [3.8622 GiB/s 3.8706 GiB/s 3.8804 GiB/s]
                 change:
                        time:   [-0.2241% +0.6495% +1.5004%] (p = 0.14 > 0.05)
                        thrpt:  [-1.4782% -0.6453% +0.2246%]
                        No change in performance detected.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
random_bytes/chacha12   time:   [332.48 ns 332.90 ns 333.31 ns]
                        thrpt:  [2.8612 GiB/s 2.8648 GiB/s 2.8683 GiB/s]
                 change:
                        time:   [+8.0156% +8.6076% +9.1503%] (p = 0.00 < 0.05)
                        thrpt:  [-8.3832% -7.9254% -7.4208%]
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  3 (3.00%) low mild
  3 (3.00%) high mild
  3 (3.00%) high severe
random_bytes/chacha20   time:   [444.71 ns 446.35 ns 448.80 ns]
                        thrpt:  [2.1250 GiB/s 2.1366 GiB/s 2.1445 GiB/s]
                 change:
                        time:   [-1.7830% -1.1872% -0.5086%] (p = 0.00 < 0.05)
                        thrpt:  [+0.5112% +1.2015% +1.8154%]
                        Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
  2 (2.00%) low mild
  5 (5.00%) high mild
  4 (4.00%) high severe

random_u32/chacha8      time:   [957.66 ps 960.68 ps 963.89 ps]
                        thrpt:  [3.8649 GiB/s 3.8778 GiB/s 3.8900 GiB/s]
                 change:
                        time:   [+1.4652% +1.9002% +2.3720%] (p = 0.00 < 0.05)
                        thrpt:  [-2.3170% -1.8648% -1.4441%]
                        Performance has regressed.
Found 130 outliers among 1000 measurements (13.00%)
  22 (2.20%) high mild
  108 (10.80%) high severe
random_u32/chacha12     time:   [1.2200 ns 1.2247 ns 1.2297 ns]
                        thrpt:  [3.0295 GiB/s 3.0417 GiB/s 3.0534 GiB/s]
                 change:
                        time:   [-2.5279% -2.0627% -1.5850%] (p = 0.00 < 0.05)
                        thrpt:  [+1.6106% +2.1062% +2.5934%]
                        Performance has improved.
Found 148 outliers among 1000 measurements (14.80%)
  67 (6.70%) high mild
  81 (8.10%) high severe
random_u32/chacha20     time:   [1.6805 ns 1.6823 ns 1.6842 ns]
                        thrpt:  [2.2119 GiB/s 2.2144 GiB/s 2.2168 GiB/s]
                 change:
                        time:   [-5.2598% -4.9730% -4.6967%] (p = 0.00 < 0.05)
                        thrpt:  [+4.9281% +5.2333% +5.5518%]
                        Performance has improved.
Found 59 outliers among 1000 measurements (5.90%)
  23 (2.30%) high mild
  36 (3.60%) high severe

random_u64/chacha8      time:   [1.3738 ns 1.3767 ns 1.3798 ns]
                        thrpt:  [5.3998 GiB/s 5.4119 GiB/s 5.4234 GiB/s]
                 change:
                        time:   [-5.2813% -4.8917% -4.5020%] (p = 0.00 < 0.05)
                        thrpt:  [+4.7143% +5.1433% +5.5758%]
                        Performance has improved.
Found 118 outliers among 1000 measurements (11.80%)
  4 (0.40%) low mild
  49 (4.90%) high mild
  65 (6.50%) high severe
random_u64/chacha12     time:   [1.9632 ns 1.9750 ns 1.9872 ns]
                        thrpt:  [3.7494 GiB/s 3.7725 GiB/s 3.7951 GiB/s]
                 change:
                        time:   [-2.9714% -2.3478% -1.7632%] (p = 0.00 < 0.05)
                        thrpt:  [+1.7949% +2.4042% +3.0624%]
                        Performance has improved.
Found 130 outliers among 1000 measurements (13.00%)
  1 (0.10%) low mild
  19 (1.90%) high mild
  110 (11.00%) high severe
random_u64/chacha20     time:   [2.8760 ns 2.8866 ns 2.8979 ns]
                        thrpt:  [2.5711 GiB/s 2.5811 GiB/s 2.5906 GiB/s]
                 change:
                        time:   [-8.6350% -8.1851% -7.7253%] (p = 0.00 < 0.05)
                        thrpt:  [+8.3720% +8.9148% +9.4511%]
                        Performance has improved.
Found 140 outliers among 1000 measurements (14.00%)
  45 (4.50%) high mild
  95 (9.50%) high severe

init_gen/chacha8        time:   [32.132 ns 32.186 ns 32.247 ns]
                        change: [+0.6341% +1.2175% +1.8555%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe
init_gen/chacha12       time:   [31.814 ns 32.010 ns 32.260 ns]
                        change: [-0.8760% +0.0020% +0.7705%] (p = 1.00 > 0.05)
                        No change in performance detected.
Found 11 outliers among 100 measurements (11.00%)
  4 (4.00%) high mild
  7 (7.00%) high severe
init_gen/chacha20       time:   [32.064 ns 32.110 ns 32.161 ns]
                        change: [-3.6865% -2.5900% -1.5765%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  5 (5.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe

init_from_u64/chacha8   time:   [41.519 ns 41.628 ns 41.731 ns]
                        change: [-4.8846% -4.5025% -4.0995%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
init_from_u64/chacha12  time:   [41.760 ns 41.833 ns 41.911 ns]
                        change: [-5.0056% -4.5781% -4.1155%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe
init_from_u64/chacha20  time:   [40.997 ns 41.045 ns 41.101 ns]
                        change: [-9.1845% -8.4379% -7.7492%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  6 (6.00%) high mild

init_from_seed/chacha8  time:   [32.865 ns 32.930 ns 33.009 ns]
                        change: [+7.3252% +7.8271% +8.3239%] (p = 0.00 < 0.05)
                        Performance has regressed.
init_from_seed/chacha12 time:   [33.311 ns 33.357 ns 33.401 ns]
                        change: [+6.7469% +7.5707% +8.3332%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe
init_from_seed/chacha20 time:   [33.337 ns 33.615 ns 33.924 ns]
                        change: [+6.3530% +7.5427% +8.6519%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) high mild
  5 (5.00%) high severe

reseeding_bytes/chacha20_4k
                        time:   [395.08 µs 396.10 µs 397.30 µs]
                        thrpt:  [2.4580 GiB/s 2.4654 GiB/s 2.4718 GiB/s]
                 change:
                        time:   [-5.1210% -4.2470% -3.3999%] (p = 0.00 < 0.05)
                        thrpt:  [+3.5195% +4.4353% +5.3975%]
                        Performance has improved.
Found 19 outliers among 100 measurements (19.00%)
  3 (3.00%) low severe
  7 (7.00%) low mild
  3 (3.00%) high mild
  6 (6.00%) high severe
reseeding_bytes/chacha20_16k
                        time:   [380.01 µs 380.82 µs 381.78 µs]
                        thrpt:  [2.5579 GiB/s 2.5644 GiB/s 2.5698 GiB/s]
                 change:
                        time:   [-0.3435% +0.0662% +0.5591%] (p = 0.77 > 0.05)
                        thrpt:  [-0.5560% -0.0661% +0.3447%]
                        No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low mild
  6 (6.00%) high mild
  5 (5.00%) high severe
reseeding_bytes/chacha20_32k
                        time:   [377.20 µs 377.86 µs 378.63 µs]
                        thrpt:  [2.5792 GiB/s 2.5844 GiB/s 2.5890 GiB/s]
                 change:
                        time:   [-1.9226% -1.2419% -0.4789%] (p = 0.00 < 0.05)
                        thrpt:  [+0.4812% +1.2575% +1.9603%]
                        Change within noise threshold.
Found 23 outliers among 100 measurements (23.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  9 (9.00%) high mild
  12 (12.00%) high severe
reseeding_bytes/chacha20_64k
                        time:   [377.04 µs 378.48 µs 379.98 µs]
                        thrpt:  [2.5700 GiB/s 2.5803 GiB/s 2.5901 GiB/s]
                 change:
                        time:   [-0.3950% +0.1281% +0.6287%] (p = 0.62 > 0.05)
                        thrpt:  [-0.6248% -0.1279% +0.3965%]
                        No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
  8 (8.00%) high mild
  4 (4.00%) high severe
reseeding_bytes/chacha20_256k
                        time:   [373.82 µs 375.39 µs 376.93 µs]
                        thrpt:  [2.5908 GiB/s 2.6014 GiB/s 2.6124 GiB/s]
                 change:
                        time:   [-2.3042% -1.5835% -0.7933%] (p = 0.00 < 0.05)
                        thrpt:  [+0.7996% +1.6090% +2.3585%]
                        Change within noise threshold.
Found 13 outliers among 100 measurements (13.00%)
  8 (8.00%) high mild
  5 (5.00%) high severe
reseeding_bytes/chacha20_1024k
                        time:   [376.31 µs 377.46 µs 378.89 µs]
                        thrpt:  [2.5774 GiB/s 2.5872 GiB/s 2.5951 GiB/s]
                 change:
                        time:   [-1.3651% -0.8050% -0.2695%] (p = 0.00 < 0.05)
                        thrpt:  [+0.2702% +0.8116% +1.3840%]
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

@dhardy
Copy link
Member Author

dhardy commented Sep 10, 2025

It may make sense to hold off merging this for a while. It conflicts with #1659.

@newpavlov
Copy link
Member

5-10% of performance drop is a bit too much for noise even with unpinned frequency considering that criterion does a proper CPU warm-up. I will try to take a closer look at the RNG implementation later.

@dhardy
Copy link
Member Author

dhardy commented Sep 11, 2025

Time

0.65%
8.61%
−1.19%
1.90%
−2.06%
−4.97%
−4.89%
−2.35%
−8.19%
1.22%
0.00%
−2.59%
−4.50%
−4.58%
−8.44%
7.83%
7.54%
−4.25%
0.07%
0.13%
−1.58%
−0.80%
Average
−1.02%

@dhardy
Copy link
Member Author

dhardy commented Sep 15, 2025

@hpenne reported the new ChaCha impls have doubled perf. on Apple M1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants