Skip to content

kreasof-ai/LLM-from-scratch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

meme

LLM from scratch, no pre-trained models, no HF transformers

This is implementation of decoder-only transformer based LLM with next-token prediction objective. This implementation use tokenizers library from HF, GQA (Grouped query attention), normalized-GPT, RoPE (Rotary positional embedding), and Liger Kernel.

There are 6 versions:

  • Using AdamW optimizer and lorem ipsum datasets (Broken RoPE) [colab notebook]
  • Using SOAP optimizer and lorem ipsum datasets (Broken RoPE) [colab notebook]
  • Using SOAP optimizer, synthetic number datasets, and larger parameter (Broken RoPE) [colab notebook]
  • Using SOAP optimizer, synthetic number datasets, smaller parameters, and larger epochs (Broken RoPE) [colab notebook]
  • Using SOAP optimizer, harder synthetic number datasets, optimized hyperparameter, liger kernel applied, and Fast-FFN (fixed RoPE) [colab notebook]
  • Using tuned SOAP optimizer, harder synthetic number datasets, optimized hyperparameter, liger kernel applied, Fast-FFN, and normalized-GPT (fixed RoPE) [colab notebook]

We publish the weights from the latest version on HF Link

Notes: There's a small mistake in RoPE implementation where RoPE is applied to value_embedding (it should be applied only to query and key). The latest two versions fixes this issue.

About

LLM from scratch, no pretrained models, no HF transformers

Topics

Resources

Stars

Watchers

Forks