Skip to content

Further performance improvements to non-transposed [SD]GEMV kernels for A64FX and Neoverse V1. #5220

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 16, 2025

Conversation

iha-taisei
Copy link
Contributor

close #5210

This pull request proposes a patch for issue #5210.
I have implemented a loop unrolling in the kernel of the non-transposed [SD]GEMV for A64FX and Neoverse V1.
This PullRequest improves performance by 1.7x for A64FX and 2x for Neoverse V1 compared to v0.3.29.
image
image

@martin-frbg martin-frbg added this to the 0.3.30 milestone Apr 11, 2025
@annop-w
Copy link
Contributor

annop-w commented Apr 15, 2025

@iha-taisei Thank you very much for your contribution. Could we please then remove gemv_n_sve.c as well ? since it will no longer be used. I think we should try keeping the number of kernels low.

@annop-w
Copy link
Contributor

annop-w commented Apr 15, 2025

@iha-taisei I have just benchmarked the gemv_n_sve_v1x3.c against the NEON kernel in #5225 on NEOVERSEV2 and the SVE version wins slightly.

c8g_sgemvn

So, would you mind adding gemv_n_sve_v1x3.c in kernel/arm64/KERNEL.NEOVERSEN2 as well please ? I will close #5225 in favor of this one. Thank you.

@martin-frbg
Copy link
Collaborator

@iha-taisei Thank you very much for your contribution. Could we please then remove gemv_n_sve.c as well ? since it will no longer be used. I think we should try keeping the number of kernels low.

I'm not convinced that we need to remove kernel files simply because they are (currently) not in use by any hardware

@abhishek-iitmadras
Copy link

Hi @annop-w

Can we use gemv_n_sve_v1x3.c for KERNEL.ARMV8SVE, like we have already for [S/D]GEMVTKERNEL with patch #5215?

cc @iha-taisei

@iha-taisei
Copy link
Contributor Author

Hi @annop-w

So, would you mind adding gemv_n_sve_v1x3.c in kernel/arm64/KERNEL.NEOVERSEN2 as well please ? I will close #5225 in favor of this one. Thank you.

Yes, you can add the kernel to KERNEL.NEOVERSEN2, although I haven't evaluated it on such CPUs.

@annop-w
Copy link
Contributor

annop-w commented Apr 16, 2025

@iha-taisei

Yes, you can add the kernel to KERNEL.NEOVERSEN2, although I haven't evaluated it on such CPUs.

I have results for NEOVERSEV2, which currently uses the same settings as NEOVERSEN2 for DYNAMIC_ARCH, in my above comment. I have not benchmarked on N2 but I believe the result will hold as well and we will see speedup.

@annop-w
Copy link
Contributor

annop-w commented Apr 16, 2025

@martin-frbg

I'm not convinced that we need to remove kernel files simply because they are (currently) not in use by any hardware

From a quick look, the kernel gemv_n_sve.c in #5157 is very similar to the one in this PR (the same no. of unrolling). That's why I suggested removing the old one. In fact, it is interesting to try to understand why this one performs significantly better.

@annop-w
Copy link
Contributor

annop-w commented Apr 16, 2025

@abhishek-iitmadras

Can we use gemv_n_sve_v1x3.c for KERNEL.ARMV8SVE, like we have already for [S/D]GEMVTKERNEL with patch #5215?

Yes, but I have not tried benchmarking on those CORTEX-As and -Xs. But, seeing how this new SVE kernel outperforms the assembly one on V1 and V2, I expect the same on those cores perhaps.

@martin-frbg
Copy link
Collaborator

@abhishek-iitmadras

Can we use gemv_n_sve_v1x3.c for KERNEL.ARMV8SVE, like we have already for [S/D]GEMVTKERNEL with patch #5215?

Yes, but I have not tried benchmarking on those CORTEX-As and -Xs. But, seeing how this new SVE kernel outperforms the assembly one on V1 and V2, I expect the same on those cores perhaps.

I can benchmark on a Pixel8, if we can agree that it is an underrated supercomputer (and if I can find the time and energy for non-trivial work again)

@martin-frbg martin-frbg merged commit 0241d51 into OpenMathLib:develop Apr 16, 2025
85 of 86 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Further performance improvements of [SD]GEMV on A64FX and Neoverse V1.
4 participants