I believe the layout composition in CUTLASS is not so robust #2113

seemingwang · 2025-02-14T10:14:42Z

I was reading the documentation here,

as regard to layout composition, However, I found that this piece of code:

void shape_mod(int* shapeA, int N, int& shapeB) {
   for (int i = 0; i < N; ++i) {
      assert(shapeA[i] %    shapeB == 0 or
                shapeB % shapeA[i] == 0);
      int new_shapeA =      min(shapeA[i], shapeB);
      int new_shapeB = ceil_div(shapeB, shapeA[i]);
      shapeA[i] = new_shapeA;
      shapeB    = new_shapeB;
   }
}

could eliminate some valid cases,

here is one example:

#include <cuda.h>
#include <stdlib.h>
#include <cute/tensor.hpp>
#include <type_traits>
using namespace cute;
int main()
{
auto l1 = make_layout(Shape<_3,Shape<_2,_2>>{}, Stride<_16,Stride<_80,_4>>{});
auto l2 = make_layout(Shape<_10,_2>{}, Stride<_16,_4>{});
auto l3 = make_layout(Shape<_3,_4>{},Stride<_1,_5>{});
for(int i = 0;i < 12;i++){
printf("trying %d res1= %d res2 = %d\n",i,l1(i),l2(l3(i)));
}
printf("%d\n",(int)compatible(l3,l1)());

//auto cc = composition(l2,l3);

}
here is the output:

trying 0 res1= 0 res2 = 0
trying 1 res1= 16 res2 = 16
trying 2 res1= 32 res2 = 32
trying 3 res1= 80 res2 = 80
trying 4 res1= 96 res2 = 96
trying 5 res1= 112 res2 = 112
trying 6 res1= 4 res2 = 4
trying 7 res1= 20 res2 = 20
trying 8 res1= 36 res2 = 36
trying 9 res1= 84 res2 = 84
trying 10 res1= 100 res2 = 100
trying 11 res1= 116 res2 = 116
1

as you can see, function l1 is a composition of l2 and l3, since l1(i) equals l2(l3(i)) all the time and l3 is compatible with l1, so l1 is by definition a composition of l2 and l3,

how ever, I commented out //auto cc = composition(l2,l3); from the code,

if I enable auto cc = composition(l2,l3);

the program can't be compiled, the system assumes this is not a valid case.

This doesn't really make much sense, does it?

#include <cuda.h>
#include <stdlib.h>
#include <cute/tensor.hpp>
#include <type_traits>
using namespace cute;
int main()
{
auto l1 = make_layout(Shape<_3,Shape<_2,_2>>{}, Stride<_16,Stride<_80,_4>>{});
auto l2 = make_layout(Shape<_10,_2>{}, Stride<_16,_4>{});
auto l3 = make_layout(Shape<_3,_4>{},Stride<_1,_5>{});
for(int i = 0;i < 12;i++){
printf("trying %d res1= %d res2 = %d\n",i,l1(i),l2(l3(i)));
}
printf("%d\n",(int)compatible(l3,l1)());

auto cc = composition(l2,l3);

}

Here is the error:
cutlass/include/cute/int_tuple.hpp(404): error: static assertion failed with "Static shape_div failure"
detected during:
instantiation of "auto cute::shape_div(const IntTupleA &, const IntTupleB &) [with IntTupleA=cute::_3, IntTupleB=cute::C<10>]"

The text was updated successfully, but these errors were encountered:

ccecka · 2025-02-14T18:39:34Z

Good observation. This is known and more robust+efficienct versions of almost all CuTe operations will be released soon along with a corresponding Whitepaper and updated documentation proving/describing the CuTe core.

seemingwang · 2025-02-15T01:26:06Z

Good observation. This is known and more robust+efficienct versions of almost all CuTe operations will be released soon along with a corresponding Whitepaper and updated documentation proving/describing the CuTe core.

looking forward to it

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I believe the layout composition in CUTLASS is not so robust #2113

I believe the layout composition in CUTLASS is not so robust #2113

seemingwang commented Feb 14, 2025 •

edited

Loading

ccecka commented Feb 14, 2025

seemingwang commented Feb 15, 2025

I believe the layout composition in CUTLASS is not so robust #2113

I believe the layout composition in CUTLASS is not so robust #2113

Comments

seemingwang commented Feb 14, 2025 • edited Loading

ccecka commented Feb 14, 2025

seemingwang commented Feb 15, 2025

seemingwang commented Feb 14, 2025 •

edited

Loading