LLM

Complete implementations of large language models including all sub-components.

Repository Structure

torch/
├── gpt/               # GPT-1 style implementation
└── llama/             # LLaMA-1/2 implementation

What's Implemented

GPT (torch/gpt/):

Multi-head self-attention with causal masking
Learned positional embeddings
LayerNorm, feedforward blocks
Training loop with loss estimation

LLaMA (torch/llama/):

Multi-head attention with Rotary Position Embeddings (RoPE)
RMSNorm (instead of LayerNorm)
SwiGLU feedforward network
Top-p sampling for generation
SentencePiece tokenizer

Usage

GPT:

cd torch/gpt
python train.py

LLaMA:

cd torch/llama
python generate.py

Default Configurations

Parameter	GPT	LLaMA
Embedding dim	384	4096
Hidden dim	-	11008
Heads	6	32
Layers	6	32
Context length	256	2048
Dropout	0.2	0.0

References

Attention Is All You Need - Vaswani et al., 2017
Layer Normalization - Ba et al., 2016
Root Mean Square Layer Normalization - Zhang & Sennrich, 2019
RoFormer: Enhanced Transformer with Rotary Position Embedding - Su et al., 2021
LLaMA: Open and Efficient Foundation Language Models - Touvron et al., 2023
LLaMA 2: Open Foundation and Fine-Tuned Chat Models - Touvron et al., 2023

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
torch		torch
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM

Repository Structure

What's Implemented

Usage

Default Configurations

References

About

Uh oh!

Releases

Packages

Languages

ratcht/llm

Folders and files

Latest commit

History

Repository files navigation

LLM

Repository Structure

What's Implemented

Usage

Default Configurations

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages