-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Further performance improvements to non-transposed [SD]GEMV kernels for A64FX and Neoverse V1. #5220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@iha-taisei Thank you very much for your contribution. Could we please then remove |
@iha-taisei I have just benchmarked the So, would you mind adding |
I'm not convinced that we need to remove kernel files simply because they are (currently) not in use by any hardware |
Hi @annop-w Can we use gemv_n_sve_v1x3.c for KERNEL.ARMV8SVE, like we have already for [S/D]GEMVTKERNEL with patch #5215? cc @iha-taisei |
I have results for NEOVERSEV2, which currently uses the same settings as NEOVERSEN2 for DYNAMIC_ARCH, in my above comment. I have not benchmarked on N2 but I believe the result will hold as well and we will see speedup. |
From a quick look, the kernel |
Yes, but I have not tried benchmarking on those CORTEX-As and -Xs. But, seeing how this new SVE kernel outperforms the assembly one on V1 and V2, I expect the same on those cores perhaps. |
I can benchmark on a Pixel8, if we can agree that it is an underrated supercomputer (and if I can find the time and energy for non-trivial work again) |
close #5210
This pull request proposes a patch for issue #5210.


I have implemented a loop unrolling in the kernel of the non-transposed [SD]GEMV for A64FX and Neoverse V1.
This PullRequest improves performance by 1.7x for A64FX and 2x for Neoverse V1 compared to v0.3.29.