Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the make_tensor function #2084

Open
LANSHANGH opened this issue Feb 6, 2025 · 3 comments
Open

About the make_tensor function #2084

LANSHANGH opened this issue Feb 6, 2025 · 3 comments

Comments

@LANSHANGH
Copy link

`template <class ProblemShape, class CtaTiler,
class TA, class AStride,
class TB, class BStride,
class TC, class CStride>
global static
void
mak_tensor(ProblemShape shape_MNK, CtaTiler cta_tiler,
TA const* A, AStride dA,
TB const* B, BStride dB,
TC const* C, CStride dC
)
{
using namespace cute;

// Preconditions

//
// Full and Tiled Tensors
//

// Represent the full tensors
Tensor mA = make_tensor(make_gmem_ptr(A), select<0,2>(shape_MNK), dA); // (M,K)
Tensor mB = make_tensor(make_gmem_ptr(B), select<1,2>(shape_MNK), dB); // (N,K)
Tensor mC = make_tensor(make_gmem_ptr(C), select<0,1>(shape_MNK), dC); // (M,N)

// Get the appropriate blocks for this thread block
auto cta_coord = make_coord(blockIdx.x, blockIdx.y, _); // (m,n,k)
Tensor gA = local_tile(mA, cta_tiler, cta_coord, Step<_1, X,_1>{}); // (BLK_M,BLK_K,k)
Tensor gB = local_tile(mB, cta_tiler, cta_coord, Step< X,_1,_1>{}); // (BLK_N,BLK_K,k)
Tensor gC = local_tile(mC, cta_tiler, cta_coord, Step<_1,_1, X>{}); // (BLK_M,BLK_N)
#if 1
if(thread0()) {
print(" mB : "); print( mB); print("\n");
// print(" gB : "); print( gB); print("\n");
// print(" sB : "); print( (sB)); print("\n");
// print("tBgB : "); print(tBgB); print("\n");
// print("tBsB : "); print(tBsB); print("\n");
// print("tArA : "); print(tArA); print("\n");
}
#endif
// (BLK_N,BLK_K)
}`

cudaErrorLaunchFailure: unspecified launch failure

I don't know why make_tensor call inside kernel function must provide ld step parameter, otherwise it will report error, but call outside kernel function can not provide ld parameter, I don't know why, hope you can give me an answer, thank you

@Junkai-Wu
Copy link
Contributor

I don't quite understand what's your issue here from the description above. From my observation, there is only one type of make_tensor call here:

Tensor mA = make_tensor(make_gmem_ptr(A), select<0,2>(shape_MNK), dA); // (M,K)
Tensor mB = make_tensor(make_gmem_ptr(B), select<1,2>(shape_MNK), dB); // (N,K)
Tensor mC = make_tensor(make_gmem_ptr(C), select<0,1>(shape_MNK), dC); // (M,N)

The other one is just your global function name. Could you state your issue more specifically?

@LANSHANGH
Copy link
Author

What I'm trying to say is that make_tensor creates a tensor of the same shape, sometimes it works, sometimes it doesn't, and here I'm using make_tensor inside a CUDA kernel where I have to provide ld arguments, otherwise I'll get an error, If I call make_tensor from inside the kernel instead of inside the main function, I can still succeed without providing ld arguments, I don't know why.

@thakkarV
Copy link
Collaborator

sometimes it works, sometimes it doesn't

What does this mean?

This launch error likely has nothing to do with make tensor. Please provide your full repro steps starting at cd-ing into the build directory

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants