Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement gesdd #899

Open
wants to merge 12 commits into
base: develop
Choose a base branch
from

Conversation

jmachado-amd
Copy link
Collaborator

This PR adds a minimal implementation of gesdd, using syevd as a backend. A complete implementation, using bdsdc, will appear in the future.

@jmachado-amd jmachado-amd added the noOptimizations Disable optimized kernels for small sizes for some routines label Feb 25, 2025
Copy link
Contributor

@EdDAzevedo EdDAzevedo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder perhaps whether these different algorithms (say using Jacobi or forming A'*A or using bidiagonal form) each have advantages and disadvantages and can be retained but selected as an algorithm option by the application? Just a thought.

@jmachado-amd jmachado-amd added the ci:no-ccache Disable ccache label Feb 26, 2025

for(j = tid; j < n; j += gridDim.x * blockDim.x)
{
for(k = 0; k < n; k++)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder whether the code can use more threads or more parallelism, say batch index b in z-dimension, j in x-dimension (as inner loop) , k in y-dimension. Just a thought.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be improved, but this is intended as an intermediate step and not the final algorithm.

size_t* size_workArr2)
{
// If quick return, set workspace to zero
if(n == 0 || m == 0 || batch_count == 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comment: perhaps set all sizes to zero first, then check if (n==0) and return. This is just defensive programming to make sure all the size variables are initialized, no undefined values. Just a minor suggestion.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

*size_splits = std::max({f1});
*size_tmptau_W = std::max({g1});
*size_tau = std::max({h1});
*size_workArr = sizeof(T) * std::max({m, n}) * std::max({m, n}) * batch_count;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just double checking whether the size_workArr is max(m,n)^2 * batch_count. It might be a very big number.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this looks like a typo, and I'll check it later. Thanks for catching that!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it turns out, that code was semantically incorrect: it was supposed to reserve space for the off-diagonal elements of the input matrix, as required by stedc. Thanks again for catching it!

@jmachado-amd jmachado-amd removed the ci:no-ccache Disable ccache label Feb 26, 2025
@amd-jnovotny
Copy link
Contributor

@jmachado-amd : Would this feature require a changelog update? If you're planning on adding that as a separate PR, then no worries.

@jmachado-amd
Copy link
Collaborator Author

@EdDAzevedo, my opinion is that svd algorithms based on creating the explicit products A A^* or A^* A are always an intermediate step and should not stay in the long run. However, I can't ignore the fact that they can be quite efficient sometimes, and some users will prefer to have the option of selecting those during runtime.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
noOptimizations Disable optimized kernels for small sizes for some routines
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants