Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(WIP) sytrd use gemv instead of symv #893

Open
wants to merge 9 commits into
base: develop
Choose a base branch
from

Conversation

EdDAzevedo
Copy link
Contributor

@EdDAzevedo EdDAzevedo commented Feb 18, 2025

Code modifications to sytrd() and latrd() to use gemv() (general matrix vector multiply) instead of symv() (symmetric matrix vector multiply).

In some implementations and problem sizes, gemv() may give higher performance compared to symv(), even though symv() should perform only half the work and touch about half the data. The implementation of symv() might also be using atomic update operations.

The changes include:

  1. allocating more work storage in sytrd() to store the conceptually untouched strictly upper triangular or strictly lower triangular part.
  2. invoke kernels in sytrd() to save (on entry) and restore (on exit) the triangular parts.
  3. invoke kernels in latrd() (on entry) to copy the strictly lower triangular part or strictly upper triangular part of matrix to enforce symmetry. This is to allow gemv() to replace calls to symv().
  4. modified xxTRD_BLOCKSIZE from 32 to 64 to reduce the cost for matrix copy for enforcing matrix symmetry in latrd()

On gfx1030, using rocsolver-bench -f sytrd --precision s --iters 5

n using gemv (us) using symv (us)
1024 62,201 73,580
2048 137,507 190,302
4096 335,996 522,818
8192 2,161,237 1,925,926

On MI300 (splinter-126-wr-d3, gfx942)

n using gemv (us) using symv (us)
1024 40,310 53,154
2048 94,027 157,111
4096 237,926 484,525
8192 689,551 1,683,223

@tfalders tfalders added the noOptimizations Disable optimized kernels for small sizes for some routines label Feb 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
noOptimizations Disable optimized kernels for small sizes for some routines
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants