Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I believe the layout composition in CUTLASS is not so robust #2113

Open
seemingwang opened this issue Feb 14, 2025 · 2 comments
Open

I believe the layout composition in CUTLASS is not so robust #2113

seemingwang opened this issue Feb 14, 2025 · 2 comments

Comments

@seemingwang
Copy link

seemingwang commented Feb 14, 2025

I was reading the documentation here,

as regard to layout composition, However, I found that this piece of code:

void shape_mod(int* shapeA, int N, int& shapeB) {
   for (int i = 0; i < N; ++i) {
      assert(shapeA[i] %    shapeB == 0 or
                shapeB % shapeA[i] == 0);
      int new_shapeA =      min(shapeA[i], shapeB);
      int new_shapeB = ceil_div(shapeB, shapeA[i]);
      shapeA[i] = new_shapeA;
      shapeB    = new_shapeB;
   }
}

could eliminate some valid cases,

here is one example:

#include <cuda.h>
#include <stdlib.h>
#include <cute/tensor.hpp>
#include <type_traits>
using namespace cute;
int main()
{
auto l1 = make_layout(Shape<_3,Shape<_2,_2>>{}, Stride<_16,Stride<_80,_4>>{});
auto l2 = make_layout(Shape<_10,_2>{}, Stride<_16,_4>{});
auto l3 = make_layout(Shape<_3,_4>{},Stride<_1,_5>{});
for(int i = 0;i < 12;i++){
printf("trying %d res1= %d res2 = %d\n",i,l1(i),l2(l3(i)));
}
printf("%d\n",(int)compatible(l3,l1)());

//auto cc = composition(l2,l3);

}
here is the output:

trying 0 res1= 0 res2 = 0
trying 1 res1= 16 res2 = 16
trying 2 res1= 32 res2 = 32
trying 3 res1= 80 res2 = 80
trying 4 res1= 96 res2 = 96
trying 5 res1= 112 res2 = 112
trying 6 res1= 4 res2 = 4
trying 7 res1= 20 res2 = 20
trying 8 res1= 36 res2 = 36
trying 9 res1= 84 res2 = 84
trying 10 res1= 100 res2 = 100
trying 11 res1= 116 res2 = 116
1

as you can see, function l1 is a composition of l2 and l3, since l1(i) equals l2(l3(i)) all the time and l3 is compatible with l1, so l1 is by definition a composition of l2 and l3,

how ever, I commented out //auto cc = composition(l2,l3); from the code,

if I enable auto cc = composition(l2,l3);

the program can't be compiled, the system assumes this is not a valid case.

This doesn't really make much sense, does it?

#include <cuda.h>
#include <stdlib.h>
#include <cute/tensor.hpp>
#include <type_traits>
using namespace cute;
int main()
{
auto l1 = make_layout(Shape<_3,Shape<_2,_2>>{}, Stride<_16,Stride<_80,_4>>{});
auto l2 = make_layout(Shape<_10,_2>{}, Stride<_16,_4>{});
auto l3 = make_layout(Shape<_3,_4>{},Stride<_1,_5>{});
for(int i = 0;i < 12;i++){
printf("trying %d res1= %d res2 = %d\n",i,l1(i),l2(l3(i)));
}
printf("%d\n",(int)compatible(l3,l1)());

auto cc = composition(l2,l3);

}

Here is the error:
cutlass/include/cute/int_tuple.hpp(404): error: static assertion failed with "Static shape_div failure"
detected during:
instantiation of "auto cute::shape_div(const IntTupleA &, const IntTupleB &) [with IntTupleA=cute::_3, IntTupleB=cute::C<10>]"

@ccecka
Copy link

ccecka commented Feb 14, 2025

Good observation. This is known and more robust+efficienct versions of almost all CuTe operations will be released soon along with a corresponding Whitepaper and updated documentation proving/describing the CuTe core.

@seemingwang
Copy link
Author

Good observation. This is known and more robust+efficienct versions of almost all CuTe operations will be released soon along with a corresponding Whitepaper and updated documentation proving/describing the CuTe core.

looking forward to it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants