About the make_tensor function #2084

LANSHANGH · 2025-02-06T17:05:57Z

`template <class ProblemShape, class CtaTiler,
class TA, class AStride,
class TB, class BStride,
class TC, class CStride>
global static
void
mak_tensor(ProblemShape shape_MNK, CtaTiler cta_tiler,
TA const* A, AStride dA,
TB const* B, BStride dB,
TC const* C, CStride dC
)
{
using namespace cute;

// Preconditions

//
// Full and Tiled Tensors
//

// Represent the full tensors
Tensor mA = make_tensor(make_gmem_ptr(A), select<0,2>(shape_MNK), dA); // (M,K)
Tensor mB = make_tensor(make_gmem_ptr(B), select<1,2>(shape_MNK), dB); // (N,K)
Tensor mC = make_tensor(make_gmem_ptr(C), select<0,1>(shape_MNK), dC); // (M,N)

// Get the appropriate blocks for this thread block
auto cta_coord = make_coord(blockIdx.x, blockIdx.y, _); // (m,n,k)
Tensor gA = local_tile(mA, cta_tiler, cta_coord, Step<_1, X,_1>{}); // (BLK_M,BLK_K,k)
Tensor gB = local_tile(mB, cta_tiler, cta_coord, Step< X,_1,_1>{}); // (BLK_N,BLK_K,k)
Tensor gC = local_tile(mC, cta_tiler, cta_coord, Step<_1,_1, X>{}); // (BLK_M,BLK_N)
#if 1
if(thread0()) {
print(" mB : "); print( mB); print("\n");
// print(" gB : "); print( gB); print("\n");
// print(" sB : "); print( (sB)); print("\n");
// print("tBgB : "); print(tBgB); print("\n");
// print("tBsB : "); print(tBsB); print("\n");
// print("tArA : "); print(tArA); print("\n");
}
#endif
// (BLK_N,BLK_K)
}`

cudaErrorLaunchFailure: unspecified launch failure

I don't know why make_tensor call inside kernel function must provide ld step parameter, otherwise it will report error, but call outside kernel function can not provide ld parameter, I don't know why, hope you can give me an answer, thank you

Junkai-Wu · 2025-02-13T07:58:41Z

I don't quite understand what's your issue here from the description above. From my observation, there is only one type of make_tensor call here:

Tensor mA = make_tensor(make_gmem_ptr(A), select<0,2>(shape_MNK), dA); // (M,K)
Tensor mB = make_tensor(make_gmem_ptr(B), select<1,2>(shape_MNK), dB); // (N,K)
Tensor mC = make_tensor(make_gmem_ptr(C), select<0,1>(shape_MNK), dC); // (M,N)

The other one is just your global function name. Could you state your issue more specifically?

LANSHANGH · 2025-02-13T16:45:34Z

What I'm trying to say is that make_tensor creates a tensor of the same shape, sometimes it works, sometimes it doesn't, and here I'm using make_tensor inside a CUDA kernel where I have to provide ld arguments, otherwise I'll get an error, If I call make_tensor from inside the kernel instead of inside the main function, I can still succeed without providing ld arguments, I don't know why.

thakkarV · 2025-02-13T17:02:21Z

sometimes it works, sometimes it doesn't

What does this mean?

This launch error likely has nothing to do with make tensor. Please provide your full repro steps starting at cd-ing into the build directory

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the make_tensor function #2084

About the make_tensor function #2084

LANSHANGH commented Feb 6, 2025

Junkai-Wu commented Feb 13, 2025

LANSHANGH commented Feb 13, 2025

thakkarV commented Feb 13, 2025

About the make_tensor function #2084

About the make_tensor function #2084

Comments

LANSHANGH commented Feb 6, 2025

Junkai-Wu commented Feb 13, 2025

LANSHANGH commented Feb 13, 2025

thakkarV commented Feb 13, 2025