-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About the make_tensor function #2084
Comments
I don't quite understand what's your issue here from the description above. From my observation, there is only one type of
The other one is just your global function name. Could you state your issue more specifically? |
What I'm trying to say is that make_tensor creates a tensor of the same shape, sometimes it works, sometimes it doesn't, and here I'm using make_tensor inside a CUDA kernel where I have to provide ld arguments, otherwise I'll get an error, If I call make_tensor from inside the kernel instead of inside the main function, I can still succeed without providing ld arguments, I don't know why. |
What does this mean? This launch error likely has nothing to do with make tensor. Please provide your full repro steps starting at cd-ing into the build directory |
`template <class ProblemShape, class CtaTiler,
class TA, class AStride,
class TB, class BStride,
class TC, class CStride>
global static
void
mak_tensor(ProblemShape shape_MNK, CtaTiler cta_tiler,
TA const* A, AStride dA,
TB const* B, BStride dB,
TC const* C, CStride dC
)
{
using namespace cute;
// Preconditions
//
// Full and Tiled Tensors
//
// Represent the full tensors
Tensor mA = make_tensor(make_gmem_ptr(A), select<0,2>(shape_MNK), dA); // (M,K)
Tensor mB = make_tensor(make_gmem_ptr(B), select<1,2>(shape_MNK), dB); // (N,K)
Tensor mC = make_tensor(make_gmem_ptr(C), select<0,1>(shape_MNK), dC); // (M,N)
// Get the appropriate blocks for this thread block
auto cta_coord = make_coord(blockIdx.x, blockIdx.y, _); // (m,n,k)
Tensor gA = local_tile(mA, cta_tiler, cta_coord, Step<_1, X,_1>{}); // (BLK_M,BLK_K,k)
Tensor gB = local_tile(mB, cta_tiler, cta_coord, Step< X,_1,_1>{}); // (BLK_N,BLK_K,k)
Tensor gC = local_tile(mC, cta_tiler, cta_coord, Step<_1,_1, X>{}); // (BLK_M,BLK_N)
#if 1
if(thread0()) {
print(" mB : "); print( mB); print("\n");
// print(" gB : "); print( gB); print("\n");
// print(" sB : "); print( (sB)); print("\n");
// print("tBgB : "); print(tBgB); print("\n");
// print("tBsB : "); print(tBsB); print("\n");
// print("tArA : "); print(tArA); print("\n");
}
#endif
// (BLK_N,BLK_K)
}`
cudaErrorLaunchFailure: unspecified launch failure
I don't know why make_tensor call inside kernel function must provide ld step parameter, otherwise it will report error, but call outside kernel function can not provide ld parameter, I don't know why, hope you can give me an answer, thank you
The text was updated successfully, but these errors were encountered: