Skip to content

open-lm-engine/accelerated-model-architectures

Repository files navigation

(Accelerated Model Architectures)

XMA is a repository comprising of fast kernels for model training.
We are planning on adding lots of experimental and fun model architectures with support for multiple accelerators like NVIDIA, AMD GPUs and Google TPUs.

layers

Module Triton CUDA
GRU βœ… ❌
MoE βœ… βœ…
RNN βœ… ❌

functional

Module Triton CUDA
bmm βœ… ❌
continuous_count ❌ βœ…
cross_entropy βœ… ❌
fused_linear_cross_entropy βœ… ❌
fused_residual_add_rmsnorm βœ… ❌
grouped_gemm ❌ βœ…
rmsnorm βœ… ❌
pack_sequence βœ… βœ…
softmax βœ… ❌
swiglu βœ… βœ…
swiglu_packed βœ… ❌
unpack_sequence βœ… βœ…

Discord Server

Join the discord server if you are interested in LLM architecture or distributed training/inference research.

About

A bunch of kernels that might make stuff slower πŸ˜‰

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •