XMA is a repository comprising of fast kernels for model training.
We are planning on adding lots of experimental and fun model architectures with support for multiple accelerators like NVIDIA, AMD GPUs and Google TPUs.
Module | Triton | CUDA |
---|---|---|
GRU | β | β |
MoE | β | β |
RNN | β | β |
Module | Triton | CUDA |
---|---|---|
bmm | β | β |
continuous_count | β | β |
cross_entropy | β | β |
fused_linear_cross_entropy | β | β |
fused_residual_add_rmsnorm | β | β |
grouped_gemm | β | β |
rmsnorm | β | β |
pack_sequence | β | β |
softmax | β | β |
swiglu | β | β |
swiglu_packed | β | β |
unpack_sequence | β | β |
Join the discord server if you are interested in LLM architecture or distributed training/inference research.