You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
02_alternative_weight_loading contains code to load the GPT model weights from alternative places in case the model weights become unavailable from OpenAI
10_llm-training-speed shows PyTorch performance tips to improve the LLM training speed
LLM Architectures From Scratch
07_gpt_to_llama contains a step-by-step guide for converting a GPT architecture implementation to Llama 3.2 and loads pretrained weights from Meta AI
11_qwen3 A from-scratch implementation of Qwen3 0.6B and Qwen3 30B-A3B (Mixture-of-Experts) including code to load the pretrained weights of the base, reasoning, and coding model variants
12_gemma3 A from-scratch implementation of Gemma 3 270M and alternative with KV cache, including code to load the pretrained weights
13_olmo3 A from-scratch implementation of Olmo 3 7B and 32B (Base, Instruct, and Think variants) and alternative with KV cache, including code to load the pretrained weights