-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement gesdd
#899
base: develop
Are you sure you want to change the base?
Implement gesdd
#899
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder perhaps whether these different algorithms (say using Jacobi or forming A'*A or using bidiagonal form) each have advantages and disadvantages and can be retained but selected as an algorithm option by the application? Just a thought.
|
||
for(j = tid; j < n; j += gridDim.x * blockDim.x) | ||
{ | ||
for(k = 0; k < n; k++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder whether the code can use more threads or more parallelism, say batch index b in z-dimension, j in x-dimension (as inner loop) , k in y-dimension. Just a thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be improved, but this is intended as an intermediate step and not the final algorithm.
size_t* size_workArr2) | ||
{ | ||
// If quick return, set workspace to zero | ||
if(n == 0 || m == 0 || batch_count == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comment: perhaps set all sizes to zero first, then check if (n==0) and return. This is just defensive programming to make sure all the size variables are initialized, no undefined values. Just a minor suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
*size_splits = std::max({f1}); | ||
*size_tmptau_W = std::max({g1}); | ||
*size_tau = std::max({h1}); | ||
*size_workArr = sizeof(T) * std::max({m, n}) * std::max({m, n}) * batch_count; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just double checking whether the size_workArr is max(m,n)^2 * batch_count. It might be a very big number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this looks like a typo, and I'll check it later. Thanks for catching that!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it turns out, that code was semantically incorrect: it was supposed to reserve space for the off-diagonal elements of the input matrix, as required by stedc
. Thanks again for catching it!
@jmachado-amd : Would this feature require a changelog update? If you're planning on adding that as a separate PR, then no worries. |
@EdDAzevedo, my opinion is that svd algorithms based on creating the explicit products |
This PR adds a minimal implementation of
gesdd
, usingsyevd
as a backend. A complete implementation, usingbdsdc
, will appear in the future.