Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor optimization #46

Merged
merged 3 commits into from
Jun 24, 2024
Merged

Minor optimization #46

merged 3 commits into from
Jun 24, 2024

Conversation

lemire
Copy link
Member

@lemire lemire commented Jun 23, 2024

This follows some initial benchmarking by @EgorBo at dotnet/runtime#103860 who reported some negative results on ARM. We are now going to include his data files in our benchmarks.

Results on AWS Graviton 3 (Neoverse V1):

Method FileName Mean Error StdDev Speed (GB/s)
SIMDUtf8ValidationRealData data/Arabic-Lipsum.utf8.txt 24,158.65 ns 171.516 ns 9.401 ns 3.38
DotnetRuntimeUtf8ValidationRealData data/Arabic-Lipsum.utf8.txt 41,360.25 ns 385.349 ns 21.122 ns 1.97
SIMDUtf8ValidationRealData data/Bogatov1069.utf8.txt 351.33 ns 1.369 ns 0.075 ns 3.04
DotnetRuntimeUtf8ValidationRealData data/Bogatov1069.utf8.txt 520.90 ns 7.581 ns 0.416 ns 2.05
SIMDUtf8ValidationRealData data/Bogatov136.utf8.txt 63.62 ns 1.739 ns 0.095 ns 2.14
DotnetRuntimeUtf8ValidationRealData data/Bogatov136.utf8.txt 72.47 ns 0.679 ns 0.037 ns 1.88
SIMDUtf8ValidationRealData data/Bogatov286.utf8.txt 118.04 ns 1.719 ns 0.094 ns 2.42
DotnetRuntimeUtf8ValidationRealData data/Bogatov286.utf8.txt 139.18 ns 3.668 ns 0.201 ns 2.05
SIMDUtf8ValidationRealData data/Bogatov527.utf8.txt 191.22 ns 1.704 ns 0.093 ns 2.76
DotnetRuntimeUtf8ValidationRealData data/Bogatov527.utf8.txt 255.25 ns 3.556 ns 0.195 ns 2.06
SIMDUtf8ValidationRealData data/Chinese-Lipsum.utf8.txt 20,654.74 ns 58.997 ns 3.234 ns 3.38
DotnetRuntimeUtf8ValidationRealData data/Chinese-Lipsum.utf8.txt 27,390.48 ns 156.597 ns 8.584 ns 2.55
SIMDUtf8ValidationRealData data/Emoji-Lipsum.utf8.txt 19,388.81 ns 112.299 ns 6.156 ns 3.38
DotnetRuntimeUtf8ValidationRealData data/Emoji-Lipsum.utf8.txt 81,125.90 ns 1,094.791 ns 60.009 ns .81
SIMDUtf8ValidationRealData data/Hebrew-Lipsum.utf8.txt 19,681.41 ns 94.898 ns 5.202 ns 3.38
DotnetRuntimeUtf8ValidationRealData data/Hebrew-Lipsum.utf8.txt 33,771.58 ns 4,770.220 ns 261.472 ns 1.97
SIMDUtf8ValidationRealData data/Hindi-Lipsum.utf8.txt 26,303.99 ns 135.452 ns 7.425 ns 3.35
DotnetRuntimeUtf8ValidationRealData data/Hindi-Lipsum.utf8.txt 53,610.39 ns 4,890.714 ns 268.077 ns 1.64
SIMDUtf8ValidationRealData data/Japanese-Lipsum.utf8.txt 20,109.93 ns 100.564 ns 5.512 ns 3.37
DotnetRuntimeUtf8ValidationRealData data/Japanese-Lipsum.utf8.txt 28,609.20 ns 76.647 ns 4.201 ns 2.37
SIMDUtf8ValidationRealData data/Korean-Lipsum.utf8.txt 19,912.58 ns 150.700 ns 8.260 ns 3.34
DotnetRuntimeUtf8ValidationRealData data/Korean-Lipsum.utf8.txt 52,203.96 ns 335.365 ns 18.382 ns 1.28
SIMDUtf8ValidationRealData data/Latin-Lipsum.utf8.txt 2,055.81 ns 3.447 ns 0.189 ns 42.29
DotnetRuntimeUtf8ValidationRealData data/Latin-Lipsum.utf8.txt 5,029.18 ns 35.681 ns 1.956 ns 17.29
SIMDUtf8ValidationRealData data/Russian-Lipsum.utf8.txt 31,369.36 ns 21.119 ns 1.158 ns 3.34
DotnetRuntimeUtf8ValidationRealData data/Russian-Lipsum.utf8.txt 110,679.04 ns 1,797.161 ns 98.508 ns .95
SIMDUtf8ValidationRealData data/arabic.utf8.txt 136,276.58 ns 4,705.752 ns 257.938 ns 3.92
DotnetRuntimeUtf8ValidationRealData data/arabic.utf8.txt 335,772.47 ns 3,591.631 ns 196.869 ns 1.59
SIMDUtf8ValidationRealData data/chinese.utf8.txt 46,978.44 ns 8,208.924 ns 449.959 ns 3.86
DotnetRuntimeUtf8ValidationRealData data/chinese.utf8.txt 87,147.05 ns 4,756.719 ns 260.732 ns 2.08
SIMDUtf8ValidationRealData data/czech.utf8.txt 41,123.35 ns 10,980.869 ns 601.898 ns 3.71
DotnetRuntimeUtf8ValidationRealData data/czech.utf8.txt 66,565.08 ns 2,311.022 ns 126.675 ns 2.29
SIMDUtf8ValidationRealData data/english.utf8.txt 17,417.32 ns 871.235 ns 47.755 ns 22.41
DotnetRuntimeUtf8ValidationRealData data/english.utf8.txt 24,774.87 ns 871.626 ns 47.777 ns 15.76
SIMDUtf8ValidationRealData data/esperanto.utf8.txt 11,975.72 ns 1,266.203 ns 69.405 ns 7.26
DotnetRuntimeUtf8ValidationRealData data/esperanto.utf8.txt 13,782.70 ns 134.035 ns 7.347 ns 6.31
SIMDUtf8ValidationRealData data/french.utf8.txt 144,430.40 ns 7,466.682 ns 409.274 ns 3.09
DotnetRuntimeUtf8ValidationRealData data/french.utf8.txt 128,505.50 ns 7,915.180 ns 433.858 ns 3.48
SIMDUtf8ValidationRealData data/german.utf8.txt 33,524.77 ns 4,730.929 ns 259.318 ns 6.14
DotnetRuntimeUtf8ValidationRealData data/german.utf8.txt 30,536.81 ns 1,116.981 ns 61.226 ns 6.74
SIMDUtf8ValidationRealData data/greek.utf8.txt 39,768.83 ns 4,038.720 ns 221.376 ns 4.56
DotnetRuntimeUtf8ValidationRealData data/greek.utf8.txt 91,992.03 ns 4,648.658 ns 254.809 ns 1.97
SIMDUtf8ValidationRealData data/hebrew.utf8.txt 55,497.71 ns 5,478.636 ns 300.303 ns 3.43
DotnetRuntimeUtf8ValidationRealData data/hebrew.utf8.txt 140,016.73 ns 9,507.891 ns 521.160 ns 1.36
SIMDUtf8ValidationRealData data/hindi.utf8.txt 93,533.46 ns 6,254.685 ns 342.840 ns 4.24
DotnetRuntimeUtf8ValidationRealData data/hindi.utf8.txt 267,074.96 ns 9,497.874 ns 520.611 ns 1.48
SIMDUtf8ValidationRealData data/japanese.utf8.txt 40,347.17 ns 4,436.762 ns 243.194 ns 4.07
DotnetRuntimeUtf8ValidationRealData data/japanese.utf8.txt 72,362.09 ns 13,707.373 ns 751.347 ns 2.27
SIMDUtf8ValidationRealData data/korean.utf8.txt 26,894.06 ns 5,179.031 ns 283.880 ns 3.64
DotnetRuntimeUtf8ValidationRealData data/korean.utf8.txt 54,747.39 ns 1,222.467 ns 67.008 ns 1.79
SIMDUtf8ValidationRealData data/persan.utf8.txt 39,748.97 ns 7,411.004 ns 406.222 ns 3.93
DotnetRuntimeUtf8ValidationRealData data/persan.utf8.txt 104,239.72 ns 453.056 ns 24.834 ns 1.50
SIMDUtf8ValidationRealData data/portuguese.utf8.txt 75,658.45 ns 4,730.071 ns 259.271 ns 3.71
DotnetRuntimeUtf8ValidationRealData data/portuguese.utf8.txt 63,024.48 ns 20,872.259 ns 1,144.079 ns 4.45
SIMDUtf8ValidationRealData data/russian.utf8.txt 108,903.30 ns 6,141.446 ns 336.633 ns 3.74
DotnetRuntimeUtf8ValidationRealData data/russian.utf8.txt 272,664.60 ns 2,284.552 ns 125.224 ns 1.49
SIMDUtf8ValidationRealData data/thai.utf8.txt 136,082.33 ns 4,574.564 ns 250.747 ns 4.36
DotnetRuntimeUtf8ValidationRealData data/thai.utf8.txt 239,407.58 ns 9,726.036 ns 533.117 ns 2.48
SIMDUtf8ValidationRealData data/turkish.utf8.txt 52,878.26 ns 6,966.967 ns 381.883 ns 3.69
DotnetRuntimeUtf8ValidationRealData data/turkish.utf8.txt 76,486.28 ns 5,998.832 ns 328.816 ns 2.55
SIMDUtf8ValidationRealData data/twitter.json 52,742.38 ns 175.435 ns 9.616 ns 11.97
DotnetRuntimeUtf8ValidationRealData data/twitter.json 72,448.82 ns 989.729 ns 54.250 ns 8.72
SIMDUtf8ValidationRealData data/vietnamese.utf8.txt 88,754.78 ns 12,962.133 ns 710.498 ns 3.59
DotnetRuntimeUtf8ValidationRealData data/vietnamese.utf8.txt 296,099.91 ns 7,875.808 ns 431.700 ns 1.08

Apple M2 results.

Method FileName Mean Error StdDev Speed (GB/s)
SIMDUtf8ValidationRealData data/Arabic-Lipsum.utf8.txt 11,399.64 ns 2,039.425 ns 111.788 ns 7.17
DotnetRuntimeUtf8ValidationRealData data/Arabic-Lipsum.utf8.txt 23,476.49 ns 1,255.384 ns 68.812 ns 3.48
SIMDUtf8ValidationRealData data/Bogatov1069.utf8.txt 179.33 ns 17.708 ns 0.971 ns 5.96
DotnetRuntimeUtf8ValidationRealData data/Bogatov1069.utf8.txt 288.72 ns 51.203 ns 2.807 ns 3.70
SIMDUtf8ValidationRealData data/Bogatov136.utf8.txt 31.83 ns 2.044 ns 0.112 ns 4.27
DotnetRuntimeUtf8ValidationRealData data/Bogatov136.utf8.txt 33.78 ns 28.276 ns 1.550 ns 4.03
SIMDUtf8ValidationRealData data/Bogatov286.utf8.txt 63.19 ns 23.075 ns 1.265 ns 4.53
DotnetRuntimeUtf8ValidationRealData data/Bogatov286.utf8.txt 68.33 ns 28.641 ns 1.570 ns 4.19
SIMDUtf8ValidationRealData data/Bogatov527.utf8.txt 97.59 ns 2.865 ns 0.157 ns 5.40
DotnetRuntimeUtf8ValidationRealData data/Bogatov527.utf8.txt 136.98 ns 7.925 ns 0.434 ns 3.85
SIMDUtf8ValidationRealData data/Chinese-Lipsum.utf8.txt 9,711.20 ns 843.972 ns 46.261 ns 7.19
DotnetRuntimeUtf8ValidationRealData data/Chinese-Lipsum.utf8.txt 14,650.94 ns 1,101.397 ns 60.371 ns 4.77
SIMDUtf8ValidationRealData data/Emoji-Lipsum.utf8.txt 9,109.92 ns 522.064 ns 28.616 ns 7.19
DotnetRuntimeUtf8ValidationRealData data/Emoji-Lipsum.utf8.txt 26,536.00 ns 14,264.855 ns 781.905 ns 2.47
SIMDUtf8ValidationRealData data/Hebrew-Lipsum.utf8.txt 9,261.09 ns 738.231 ns 40.465 ns 7.18
DotnetRuntimeUtf8ValidationRealData data/Hebrew-Lipsum.utf8.txt 19,261.95 ns 1,497.396 ns 82.077 ns 3.45
SIMDUtf8ValidationRealData data/Hindi-Lipsum.utf8.txt 12,268.32 ns 1,660.140 ns 90.998 ns 7.17
DotnetRuntimeUtf8ValidationRealData data/Hindi-Lipsum.utf8.txt 29,700.12 ns 3,732.162 ns 204.572 ns 2.96
SIMDUtf8ValidationRealData data/Japanese-Lipsum.utf8.txt 9,368.63 ns 201.405 ns 11.040 ns 7.24
DotnetRuntimeUtf8ValidationRealData data/Japanese-Lipsum.utf8.txt 14,596.26 ns 1,320.289 ns 72.369 ns 4.65
SIMDUtf8ValidationRealData data/Korean-Lipsum.utf8.txt 9,218.95 ns 299.560 ns 16.420 ns 7.22
DotnetRuntimeUtf8ValidationRealData data/Korean-Lipsum.utf8.txt 36,247.10 ns 5,763.164 ns 315.898 ns 1.84
SIMDUtf8ValidationRealData data/Latin-Lipsum.utf8.txt 991.56 ns 148.287 ns 8.128 ns 87.68
DotnetRuntimeUtf8ValidationRealData data/Latin-Lipsum.utf8.txt 2,329.52 ns 436.663 ns 23.935 ns 37.32
SIMDUtf8ValidationRealData data/Russian-Lipsum.utf8.txt 16,209.71 ns 49,686.589 ns 2,723.490 ns 6.46
DotnetRuntimeUtf8ValidationRealData data/Russian-Lipsum.utf8.txt 38,774.13 ns 15,205.809 ns 833.482 ns 2.70
SIMDUtf8ValidationRealData data/arabic.utf8.txt 82,135.62 ns 75,320.317 ns 4,128.561 ns 6.50
DotnetRuntimeUtf8ValidationRealData data/arabic.utf8.txt 213,231.93 ns 94,725.565 ns 5,192.228 ns 2.50
SIMDUtf8ValidationRealData data/chinese.utf8.txt 22,018.20 ns 14,750.411 ns 808.520 ns 8.24
DotnetRuntimeUtf8ValidationRealData data/chinese.utf8.txt 48,770.69 ns 46,322.436 ns 2,539.089 ns 3.72
SIMDUtf8ValidationRealData data/czech.utf8.txt 17,871.31 ns 23,368.427 ns 1,280.902 ns 8.55
DotnetRuntimeUtf8ValidationRealData data/czech.utf8.txt 35,975.96 ns 9,030.069 ns 494.969 ns 4.25
SIMDUtf8ValidationRealData data/english.utf8.txt 8,887.52 ns 385.530 ns 21.132 ns 43.92
DotnetRuntimeUtf8ValidationRealData data/english.utf8.txt 16,732.80 ns 1,301.154 ns 71.321 ns 23.33
SIMDUtf8ValidationRealData data/esperanto.utf8.txt 5,540.34 ns 214.626 ns 11.764 ns 15.70
DotnetRuntimeUtf8ValidationRealData data/esperanto.utf8.txt 9,494.63 ns 112.688 ns 6.177 ns 9.16
SIMDUtf8ValidationRealData data/french.utf8.txt 83,480.49 ns 29,775.365 ns 1,632.088 ns 5.35
DotnetRuntimeUtf8ValidationRealData data/french.utf8.txt 88,632.88 ns 10,779.051 ns 590.836 ns 5.04
SIMDUtf8ValidationRealData data/german.utf8.txt 16,116.36 ns 25,195.670 ns 1,381.060 ns 12.77
DotnetRuntimeUtf8ValidationRealData data/german.utf8.txt 22,503.68 ns 1,330.839 ns 72.948 ns 9.14
SIMDUtf8ValidationRealData data/greek.utf8.txt 18,711.08 ns 5,684.170 ns 311.569 ns 9.69
DotnetRuntimeUtf8ValidationRealData data/greek.utf8.txt 57,779.24 ns 12,729.790 ns 697.763 ns 3.14
SIMDUtf8ValidationRealData data/hebrew.utf8.txt 26,752.19 ns 21,639.367 ns 1,186.127 ns 7.11
DotnetRuntimeUtf8ValidationRealData data/hebrew.utf8.txt 83,893.26 ns 62,240.251 ns 3,411.598 ns 2.27
SIMDUtf8ValidationRealData data/hindi.utf8.txt 44,453.68 ns 26,985.647 ns 1,479.174 ns 8.92
DotnetRuntimeUtf8ValidationRealData data/hindi.utf8.txt 175,719.39 ns 107,454.435 ns 5,889.940 ns 2.26
SIMDUtf8ValidationRealData data/japanese.utf8.txt 19,261.51 ns 4,880.817 ns 267.534 ns 8.53
DotnetRuntimeUtf8ValidationRealData data/japanese.utf8.txt 50,006.96 ns 79,439.404 ns 4,354.342 ns 3.29
SIMDUtf8ValidationRealData data/korean.utf8.txt 10,816.74 ns 820.990 ns 45.001 ns 9.05
DotnetRuntimeUtf8ValidationRealData data/korean.utf8.txt 31,789.57 ns 8,716.355 ns 477.773 ns 3.08
SIMDUtf8ValidationRealData data/persan.utf8.txt 17,584.40 ns 8,038.834 ns 440.636 ns 8.88
DotnetRuntimeUtf8ValidationRealData data/persan.utf8.txt 49,597.62 ns 33,180.391 ns 1,818.729 ns 3.15
SIMDUtf8ValidationRealData data/portuguese.utf8.txt 32,645.46 ns 54,609.499 ns 2,993.331 ns 8.60
DotnetRuntimeUtf8ValidationRealData data/portuguese.utf8.txt 46,334.64 ns 25,046.242 ns 1,372.869 ns 6.06
SIMDUtf8ValidationRealData data/russian.utf8.txt 54,265.85 ns 42,360.008 ns 2,321.895 ns 7.50
DotnetRuntimeUtf8ValidationRealData data/russian.utf8.txt 168,228.80 ns 46,455.269 ns 2,546.370 ns 2.42
SIMDUtf8ValidationRealData data/thai.utf8.txt 68,946.68 ns 35,710.480 ns 1,957.412 ns 8.61
DotnetRuntimeUtf8ValidationRealData data/thai.utf8.txt 140,746.17 ns 97,987.595 ns 5,371.031 ns 4.22
SIMDUtf8ValidationRealData data/turkish.utf8.txt 18,900.31 ns 31,626.706 ns 1,733.566 ns 10.32
DotnetRuntimeUtf8ValidationRealData data/turkish.utf8.txt 42,712.48 ns 79,564.867 ns 4,361.219 ns 4.57
SIMDUtf8ValidationRealData data/twitter.json 25,863.13 ns 8,125.716 ns 445.398 ns 24.42
DotnetRuntimeUtf8ValidationRealData data/twitter.json 43,760.80 ns 6,249.167 ns 342.538 ns 14.43
SIMDUtf8ValidationRealData data/vietnamese.utf8.txt 44,278.79 ns 57,707.891 ns 3,163.164 ns 7.21
DotnetRuntimeUtf8ValidationRealData data/vietnamese.utf8.txt 210,026.00 ns 49,423.902 ns 2,709.091 ns 1.52

@lemire lemire requested a review from Nick-Nuon June 23, 2024 21:35
@lemire lemire merged commit b919f83 into main Jun 24, 2024
6 checks passed
Copy link
Collaborator

@Nick-Nuon Nick-Nuon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seen the PR and I take note of the merge!
Other than the typo , it looks good to me!


if (AdvSimd.Arm64.MaxAcross(currentBlock).ToScalar() <= 127)
if (AdvSimd.Arm64.MaxAcross(Vector128.AsUInt32(AdvSimd.And(currentBlock, v80))).ToScalar() == 0)
// We could it with (AdvSimd.Arm64.MaxAcross(currentBlock).ToScalar() <= 127) but it is slower on some
Copy link
Collaborator

@Nick-Nuon Nick-Nuon Jun 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be a very minor typo here in the comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants