-
Notifications
You must be signed in to change notification settings - Fork 4
Add Neon mld_polyvecl_pointwise_acc_montgomery_l{4,5,7}_native
#281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
aa0efb9
to
c2e3863
Compare
mld_polyvecl_pointwise_acc_montgomery_l{4,5,7}_native
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mac Mini (M1, 2020) benchmarks (opt)
Benchmark suite | Current: e9c3850 | Previous: 062f811 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
98795 cycles |
100259 cycles |
0.99 |
ML-DSA-44 sign |
220838 cycles |
225382 cycles |
0.98 |
ML-DSA-44 verify |
100723 cycles |
102348 cycles |
0.98 |
ML-DSA-65 keypair |
187512 cycles |
181582 cycles |
1.03 |
ML-DSA-65 sign |
355226 cycles |
365151 cycles |
0.97 |
ML-DSA-65 verify |
165873 cycles |
168247 cycles |
0.99 |
ML-DSA-87 keypair |
293129 cycles |
296190 cycles |
0.99 |
ML-DSA-87 sign |
495585 cycles |
504189 cycles |
0.98 |
ML-DSA-87 verify |
290593 cycles |
293610 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Mac Mini (M1, 2020) benchmarks (opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03
.
Benchmark suite | Current: e9c3850 | Previous: 062f811 | Ratio |
---|---|---|---|
ML-DSA-65 keypair |
187512 cycles |
181582 cycles |
1.03 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mac Mini (M1, 2020) benchmarks (no-opt)
Benchmark suite | Current: e9c3850 | Previous: 062f811 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
115144 cycles |
115133 cycles |
1.00 |
ML-DSA-44 sign |
354603 cycles |
354678 cycles |
1.00 |
ML-DSA-44 verify |
124702 cycles |
124700 cycles |
1.00 |
ML-DSA-65 keypair |
202234 cycles |
202233 cycles |
1.00 |
ML-DSA-65 sign |
563400 cycles |
563367 cycles |
1.00 |
ML-DSA-65 verify |
199679 cycles |
199698 cycles |
1.00 |
ML-DSA-87 keypair |
324059 cycles |
324064 cycles |
1.00 |
ML-DSA-87 sign |
727103 cycles |
727079 cycles |
1.00 |
ML-DSA-87 verify |
332139 cycles |
332161 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 4th gen (c7i)
Benchmark suite | Current: e9c3850 | Previous: 062f811 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
98819 cycles |
98683 cycles |
1.00 |
ML-DSA-44 sign |
282436 cycles |
282233 cycles |
1.00 |
ML-DSA-44 verify |
102909 cycles |
103404 cycles |
1.00 |
ML-DSA-65 keypair |
165585 cycles |
164971 cycles |
1.00 |
ML-DSA-65 sign |
449412 cycles |
446621 cycles |
1.01 |
ML-DSA-65 verify |
163626 cycles |
163228 cycles |
1.00 |
ML-DSA-87 keypair |
274172 cycles |
274227 cycles |
1.00 |
ML-DSA-87 sign |
587370 cycles |
588128 cycles |
1.00 |
ML-DSA-87 verify |
272295 cycles |
272530 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 3rd gen (c6i)
Benchmark suite | Current: e9c3850 | Previous: 062f811 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
152098 cycles |
152095 cycles |
1.00 |
ML-DSA-44 sign |
444995 cycles |
444105 cycles |
1.00 |
ML-DSA-44 verify |
161410 cycles |
161258 cycles |
1.00 |
ML-DSA-65 keypair |
254856 cycles |
254909 cycles |
1.00 |
ML-DSA-65 sign |
692231 cycles |
691112 cycles |
1.00 |
ML-DSA-65 verify |
254858 cycles |
254797 cycles |
1.00 |
ML-DSA-87 keypair |
424938 cycles |
424453 cycles |
1.00 |
ML-DSA-87 sign |
916489 cycles |
917430 cycles |
1.00 |
ML-DSA-87 verify |
427940 cycles |
427318 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 4th gen (c7i) (no-opt)
Benchmark suite | Current: e9c3850 | Previous: 062f811 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
98528 cycles |
98671 cycles |
1.00 |
ML-DSA-44 sign |
282918 cycles |
282749 cycles |
1.00 |
ML-DSA-44 verify |
102905 cycles |
103271 cycles |
1.00 |
ML-DSA-65 keypair |
165420 cycles |
165680 cycles |
1.00 |
ML-DSA-65 sign |
448721 cycles |
449512 cycles |
1.00 |
ML-DSA-65 verify |
163567 cycles |
163709 cycles |
1.00 |
ML-DSA-87 keypair |
273790 cycles |
274502 cycles |
1.00 |
ML-DSA-87 sign |
588743 cycles |
588030 cycles |
1.00 |
ML-DSA-87 verify |
272629 cycles |
272042 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 3rd gen (c6i) (no-opt)
Benchmark suite | Current: e9c3850 | Previous: 062f811 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
152111 cycles |
152325 cycles |
1.00 |
ML-DSA-44 sign |
444140 cycles |
444936 cycles |
1.00 |
ML-DSA-44 verify |
161220 cycles |
161297 cycles |
1.00 |
ML-DSA-65 keypair |
254909 cycles |
254742 cycles |
1.00 |
ML-DSA-65 sign |
693678 cycles |
691709 cycles |
1.00 |
ML-DSA-65 verify |
254684 cycles |
254653 cycles |
1.00 |
ML-DSA-87 keypair |
424795 cycles |
424479 cycles |
1.00 |
ML-DSA-87 sign |
916121 cycles |
917945 cycles |
1.00 |
ML-DSA-87 verify |
427729 cycles |
427515 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 3rd gen (c6a)
Benchmark suite | Current: e9c3850 | Previous: 062f811 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
136196 cycles |
136220 cycles |
1.00 |
ML-DSA-44 sign |
437198 cycles |
437480 cycles |
1.00 |
ML-DSA-44 verify |
147615 cycles |
147439 cycles |
1.00 |
ML-DSA-65 keypair |
224159 cycles |
224408 cycles |
1.00 |
ML-DSA-65 sign |
673932 cycles |
675932 cycles |
1.00 |
ML-DSA-65 verify |
227650 cycles |
228096 cycles |
1.00 |
ML-DSA-87 keypair |
374913 cycles |
374879 cycles |
1.00 |
ML-DSA-87 sign |
886192 cycles |
886615 cycles |
1.00 |
ML-DSA-87 verify |
382383 cycles |
382821 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 4th gen (c7a)
Benchmark suite | Current: e9c3850 | Previous: 062f811 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
119944 cycles |
119848 cycles |
1.00 |
ML-DSA-44 sign |
369702 cycles |
372574 cycles |
0.99 |
ML-DSA-44 verify |
128077 cycles |
128111 cycles |
1.00 |
ML-DSA-65 keypair |
199526 cycles |
199629 cycles |
1.00 |
ML-DSA-65 sign |
562421 cycles |
563339 cycles |
1.00 |
ML-DSA-65 verify |
200935 cycles |
201022 cycles |
1.00 |
ML-DSA-87 keypair |
332568 cycles |
332393 cycles |
1.00 |
ML-DSA-87 sign |
735105 cycles |
740390 cycles |
0.99 |
ML-DSA-87 verify |
335144 cycles |
335127 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 3rd gen (c6a) (no-opt)
Benchmark suite | Current: e9c3850 | Previous: 062f811 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
136105 cycles |
136234 cycles |
1.00 |
ML-DSA-44 sign |
437119 cycles |
436841 cycles |
1.00 |
ML-DSA-44 verify |
147784 cycles |
147385 cycles |
1.00 |
ML-DSA-65 keypair |
224174 cycles |
224115 cycles |
1.00 |
ML-DSA-65 sign |
673368 cycles |
674617 cycles |
1.00 |
ML-DSA-65 verify |
227647 cycles |
227473 cycles |
1.00 |
ML-DSA-87 keypair |
374787 cycles |
374645 cycles |
1.00 |
ML-DSA-87 sign |
885681 cycles |
885689 cycles |
1.00 |
ML-DSA-87 verify |
382192 cycles |
382513 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)
Benchmark suite | Current: e9c3850 | Previous: 062f811 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
195813 cycles |
195538 cycles |
1.00 |
ML-DSA-44 sign |
467780 cycles |
468288 cycles |
1.00 |
ML-DSA-44 verify |
198346 cycles |
198500 cycles |
1.00 |
ML-DSA-65 keypair |
349532 cycles |
349924 cycles |
1.00 |
ML-DSA-65 sign |
766754 cycles |
769221 cycles |
1.00 |
ML-DSA-65 verify |
327977 cycles |
328753 cycles |
1.00 |
ML-DSA-87 keypair |
574513 cycles |
574288 cycles |
1.00 |
ML-DSA-87 sign |
1039442 cycles |
1041264 cycles |
1.00 |
ML-DSA-87 verify |
559160 cycles |
561305 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 4th gen (c7a) (no-opt)
Benchmark suite | Current: e9c3850 | Previous: 062f811 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
120139 cycles |
119894 cycles |
1.00 |
ML-DSA-44 sign |
371046 cycles |
370018 cycles |
1.00 |
ML-DSA-44 verify |
127996 cycles |
128095 cycles |
1.00 |
ML-DSA-65 keypair |
199546 cycles |
199529 cycles |
1.00 |
ML-DSA-65 sign |
568638 cycles |
562876 cycles |
1.01 |
ML-DSA-65 verify |
200784 cycles |
201061 cycles |
1.00 |
ML-DSA-87 keypair |
332459 cycles |
332423 cycles |
1.00 |
ML-DSA-87 sign |
736425 cycles |
734871 cycles |
1.00 |
ML-DSA-87 verify |
335152 cycles |
334513 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton3
Benchmark suite | Current: e9c3850 | Previous: 062f811 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
126643 cycles |
126462 cycles |
1.00 |
ML-DSA-44 sign |
284818 cycles |
285843 cycles |
1.00 |
ML-DSA-44 verify |
127276 cycles |
127639 cycles |
1.00 |
ML-DSA-65 keypair |
219410 cycles |
219551 cycles |
1.00 |
ML-DSA-65 sign |
464942 cycles |
466644 cycles |
1.00 |
ML-DSA-65 verify |
209992 cycles |
210189 cycles |
1.00 |
ML-DSA-87 keypair |
373933 cycles |
373515 cycles |
1.00 |
ML-DSA-87 sign |
642871 cycles |
644031 cycles |
1.00 |
ML-DSA-87 verify |
360443 cycles |
360543 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton4
Benchmark suite | Current: e9c3850 | Previous: 062f811 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
120124 cycles |
119481 cycles |
1.01 |
ML-DSA-44 sign |
268999 cycles |
270117 cycles |
1.00 |
ML-DSA-44 verify |
120179 cycles |
120339 cycles |
1.00 |
ML-DSA-65 keypair |
206894 cycles |
207331 cycles |
1.00 |
ML-DSA-65 sign |
432500 cycles |
433348 cycles |
1.00 |
ML-DSA-65 verify |
198203 cycles |
198268 cycles |
1.00 |
ML-DSA-87 keypair |
351026 cycles |
351426 cycles |
1.00 |
ML-DSA-87 sign |
595565 cycles |
595083 cycles |
1.00 |
ML-DSA-87 verify |
337999 cycles |
338116 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)
Benchmark suite | Current: e9c3850 | Previous: 062f811 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
214446 cycles |
214463 cycles |
1.00 |
ML-DSA-44 sign |
628958 cycles |
628572 cycles |
1.00 |
ML-DSA-44 verify |
228777 cycles |
228771 cycles |
1.00 |
ML-DSA-65 keypair |
376155 cycles |
376467 cycles |
1.00 |
ML-DSA-65 sign |
1011977 cycles |
1011699 cycles |
1.00 |
ML-DSA-65 verify |
370499 cycles |
370770 cycles |
1.00 |
ML-DSA-87 keypair |
615981 cycles |
615520 cycles |
1.00 |
ML-DSA-87 sign |
1356920 cycles |
1355773 cycles |
1.00 |
ML-DSA-87 verify |
628956 cycles |
629617 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton2
Benchmark suite | Current: e9c3850 | Previous: 062f811 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
196062 cycles |
195490 cycles |
1.00 |
ML-DSA-44 sign |
469150 cycles |
468458 cycles |
1.00 |
ML-DSA-44 verify |
198553 cycles |
198578 cycles |
1.00 |
ML-DSA-65 keypair |
349372 cycles |
350052 cycles |
1.00 |
ML-DSA-65 sign |
766723 cycles |
769817 cycles |
1.00 |
ML-DSA-65 verify |
328014 cycles |
328702 cycles |
1.00 |
ML-DSA-87 keypair |
574007 cycles |
573635 cycles |
1.00 |
ML-DSA-87 sign |
1038975 cycles |
1040820 cycles |
1.00 |
ML-DSA-87 verify |
562328 cycles |
562639 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton3 (no-opt)
Benchmark suite | Current: e9c3850 | Previous: 062f811 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
138804 cycles |
138763 cycles |
1.00 |
ML-DSA-44 sign |
392926 cycles |
393228 cycles |
1.00 |
ML-DSA-44 verify |
146696 cycles |
146558 cycles |
1.00 |
ML-DSA-65 keypair |
236587 cycles |
236680 cycles |
1.00 |
ML-DSA-65 sign |
627492 cycles |
628105 cycles |
1.00 |
ML-DSA-65 verify |
237075 cycles |
236790 cycles |
1.00 |
ML-DSA-87 keypair |
398147 cycles |
397573 cycles |
1.00 |
ML-DSA-87 sign |
828379 cycles |
829008 cycles |
1.00 |
ML-DSA-87 verify |
396970 cycles |
397656 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton4 (no-opt)
Benchmark suite | Current: e9c3850 | Previous: 062f811 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
132260 cycles |
132229 cycles |
1.00 |
ML-DSA-44 sign |
386511 cycles |
386680 cycles |
1.00 |
ML-DSA-44 verify |
140942 cycles |
140904 cycles |
1.00 |
ML-DSA-65 keypair |
226509 cycles |
226499 cycles |
1.00 |
ML-DSA-65 sign |
624613 cycles |
625373 cycles |
1.00 |
ML-DSA-65 verify |
227469 cycles |
227445 cycles |
1.00 |
ML-DSA-87 keypair |
375517 cycles |
375635 cycles |
1.00 |
ML-DSA-87 sign |
811770 cycles |
812807 cycles |
1.00 |
ML-DSA-87 verify |
375381 cycles |
378722 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton2 (no-opt)
Benchmark suite | Current: e9c3850 | Previous: 062f811 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
214565 cycles |
215403 cycles |
1.00 |
ML-DSA-44 sign |
629195 cycles |
629588 cycles |
1.00 |
ML-DSA-44 verify |
228932 cycles |
229804 cycles |
1.00 |
ML-DSA-65 keypair |
376226 cycles |
376318 cycles |
1.00 |
ML-DSA-65 sign |
1011213 cycles |
1012301 cycles |
1.00 |
ML-DSA-65 verify |
370222 cycles |
370692 cycles |
1.00 |
ML-DSA-87 keypair |
616464 cycles |
615892 cycles |
1.00 |
ML-DSA-87 sign |
1358962 cycles |
1357262 cycles |
1.00 |
ML-DSA-87 verify |
630201 cycles |
632135 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)
Benchmark suite | Current: e9c3850 | Previous: 062f811 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
380460 cycles |
379624 cycles |
1.00 |
ML-DSA-44 sign |
1001498 cycles |
1004065 cycles |
1.00 |
ML-DSA-44 verify |
397993 cycles |
398120 cycles |
1.00 |
ML-DSA-65 keypair |
658524 cycles |
659069 cycles |
1.00 |
ML-DSA-65 sign |
1634484 cycles |
1625403 cycles |
1.01 |
ML-DSA-65 verify |
637527 cycles |
637846 cycles |
1.00 |
ML-DSA-87 keypair |
1104235 cycles |
1098780 cycles |
1.00 |
ML-DSA-87 sign |
2188006 cycles |
2189409 cycles |
1.00 |
ML-DSA-87 verify |
1086286 cycles |
1086531 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)
Benchmark suite | Current: e9c3850 | Previous: 062f811 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
264475 cycles |
263857 cycles |
1.00 |
ML-DSA-44 sign |
734976 cycles |
687983 cycles |
1.07 |
ML-DSA-44 verify |
265297 cycles |
266253 cycles |
1.00 |
ML-DSA-65 keypair |
492277 cycles |
489525 cycles |
1.01 |
ML-DSA-65 sign |
1174015 cycles |
1066611 cycles |
1.10 |
ML-DSA-65 verify |
443088 cycles |
440971 cycles |
1.00 |
ML-DSA-87 keypair |
773427 cycles |
766135 cycles |
1.01 |
ML-DSA-87 sign |
1464879 cycles |
1446596 cycles |
1.01 |
ML-DSA-87 verify |
760362 cycles |
748425 cycles |
1.02 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03
.
Benchmark suite | Current: e9c3850 | Previous: 062f811 | Ratio |
---|---|---|---|
ML-DSA-44 sign |
734976 cycles |
687983 cycles |
1.07 |
ML-DSA-65 sign |
1174015 cycles |
1066611 cycles |
1.10 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A55 (Snapdragon 888) benchmarks (no-opt)
Benchmark suite | Current: e9c3850 | Previous: 062f811 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
469899 cycles |
470052 cycles |
1.00 |
ML-DSA-44 sign |
1778220 cycles |
1777818 cycles |
1.00 |
ML-DSA-44 verify |
538870 cycles |
539568 cycles |
1.00 |
ML-DSA-65 keypair |
784129 cycles |
783732 cycles |
1.00 |
ML-DSA-65 sign |
2814346 cycles |
2812726 cycles |
1.00 |
ML-DSA-65 verify |
834144 cycles |
833767 cycles |
1.00 |
ML-DSA-87 keypair |
1270161 cycles |
1278540 cycles |
0.99 |
ML-DSA-87 sign |
3545617 cycles |
3554561 cycles |
1.00 |
ML-DSA-87 verify |
1346236 cycles |
1355321 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)
Benchmark suite | Current: e9c3850 | Previous: 062f811 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
297306 cycles |
296045 cycles |
1.00 |
ML-DSA-44 sign |
1039798 cycles |
945323 cycles |
1.10 |
ML-DSA-44 verify |
317739 cycles |
318588 cycles |
1.00 |
ML-DSA-65 keypair |
536213 cycles |
537722 cycles |
1.00 |
ML-DSA-65 sign |
1510993 cycles |
1509477 cycles |
1.00 |
ML-DSA-65 verify |
514003 cycles |
514381 cycles |
1.00 |
ML-DSA-87 keypair |
832452 cycles |
832883 cycles |
1.00 |
ML-DSA-87 sign |
1957313 cycles |
1950150 cycles |
1.00 |
ML-DSA-87 verify |
847156 cycles |
848568 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03
.
Benchmark suite | Current: e9c3850 | Previous: 062f811 | Ratio |
---|---|---|---|
ML-DSA-44 sign |
1039798 cycles |
945323 cycles |
1.10 |
This comment was automatically generated by workflow using github-action-benchmark.
816ff86
to
d42cabe
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SpacemiT K1 8 (Banana Pi F3) benchmarks (no-opt)
Benchmark suite | Current: e9c3850 | Previous: 062f811 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
952104 cycles |
952452 cycles |
1.00 |
ML-DSA-44 sign |
3693692 cycles |
3679288 cycles |
1.00 |
ML-DSA-44 verify |
1079215 cycles |
1079440 cycles |
1.00 |
ML-DSA-65 keypair |
1574608 cycles |
1573598 cycles |
1.00 |
ML-DSA-65 sign |
5847661 cycles |
5851330 cycles |
1.00 |
ML-DSA-65 verify |
1698671 cycles |
1699344 cycles |
1.00 |
ML-DSA-87 keypair |
2546671 cycles |
2546316 cycles |
1.00 |
ML-DSA-87 sign |
7303532 cycles |
7248273 cycles |
1.01 |
ML-DSA-87 verify |
2711419 cycles |
2710426 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
These are basically written from scratch inspired by the same functions in mlkem-native. Resolves #257 Signed-off-by: Matthias J. Kannwischer <[email protected]>
Signed-off-by: Matthias J. Kannwischer <[email protected]>
d42cabe
to
e9c3850
Compare
@@ -44,15 +47,20 @@ static int cmp_uint64_t(const void *a, const void *b) | |||
|
|||
static int bench(void) | |||
{ | |||
int32_t data0[256]; | |||
MLD_ALIGN int32_t data0[256]; | |||
MLD_ALIGN int32_t data1[MLDSA_K * 256]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be MLDSA_L
? I'm surprised that no valgrind test is failing here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
K is strictly larger than L - so it is not surprising that this does not fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ha, true.
add a3_ptr, a0_ptr, #(3 * 1024) | ||
add a4_ptr, a0_ptr, #(4 * 1024) | ||
add a5_ptr, a4_ptr, #(1 * 1024) | ||
add a6_ptr, a5_ptr, #(1 * 1024) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Use a4 here to avoid prolonging dependency chain
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @mkannwischer! I support this change -- there should also be a small speedup from using SLOTHY on the code.
Left some comments
add b3_ptr, b0_ptr, #(3 * 1024) | ||
add b4_ptr, b0_ptr, #(4 * 1024) | ||
add b5_ptr, b4_ptr, #(1 * 1024) | ||
add b6_ptr, b5_ptr, #(1 * 1024) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Use b4 here to avoid prolonging dependency chain
#if defined(MLD_USE_NATIVE_POLYVECL_POINTWISE_ACC_MONTGOMERY) | ||
|
||
#if MLDSA_L == 4 | ||
/************************************************* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Indentation
These are basically written from scratch inspired by the same functions in mlkem-native.
Resolves #257