Name	Name	Last commit message	Last commit date
parent directory ..
01_main-chapter-code	01_main-chapter-code
02_alternative_weight_loading	02_alternative_weight_loading
03_bonus_pretraining_on_gutenberg	03_bonus_pretraining_on_gutenberg
04_learning_rate_schedulers	04_learning_rate_schedulers
05_bonus_hparam_tuning	05_bonus_hparam_tuning
06_user_interface	06_user_interface
07_gpt_to_llama	07_gpt_to_llama
08_memory_efficient_weight_loading	08_memory_efficient_weight_loading
09_extending-tokenizers	09_extending-tokenizers
10_llm-training-speed	10_llm-training-speed
11_qwen3	11_qwen3
12_gemma3	12_gemma3
13_olmo3	13_olmo3
README.md	README.md

Name

Last commit message

Last commit date

01_main-chapter-code

02_alternative_weight_loading

03_bonus_pretraining_on_gutenberg

04_learning_rate_schedulers

05_bonus_hparam_tuning

06_user_interface

07_gpt_to_llama

08_memory_efficient_weight_loading

09_extending-tokenizers

10_llm-training-speed

Chapter 5: Pretraining on Unlabeled Data

Main Chapter Code

01_main-chapter-code contains the main chapter code

Bonus Materials

02_alternative_weight_loading contains code to load the GPT model weights from alternative places in case the model weights become unavailable from OpenAI
03_bonus_pretraining_on_gutenberg contains code to pretrain the LLM longer on the whole corpus of books from Project Gutenberg
04_learning_rate_schedulers contains code implementing a more sophisticated training function including learning rate schedulers and gradient clipping
05_bonus_hparam_tuning contains an optional hyperparameter tuning script
06_user_interface implements an interactive user interface to interact with the pretrained LLM
08_memory_efficient_weight_loading contains a bonus notebook showing how to load model weights via PyTorch's load_state_dict method more efficiently
09_extending-tokenizers contains a from-scratch implementation of the GPT-2 BPE tokenizer
10_llm-training-speed shows PyTorch performance tips to improve the LLM training speed

LLM Architectures From Scratch

07_gpt_to_llama contains a step-by-step guide for converting a GPT architecture implementation to Llama 3.2 and loads pretrained weights from Meta AI
11_qwen3 A from-scratch implementation of Qwen3 0.6B and Qwen3 30B-A3B (Mixture-of-Experts) including code to load the pretrained weights of the base, reasoning, and coding model variants
12_gemma3 A from-scratch implementation of Gemma 3 270M and alternative with KV cache, including code to load the pretrained weights
13_olmo3 A from-scratch implementation of Olmo 3 7B and 32B (Base, Instruct, and Think variants) and alternative with KV cache, including code to load the pretrained weights

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Chapter 5: Pretraining on Unlabeled Data

Main Chapter Code

Bonus Materials

LLM Architectures From Scratch

Code-Along Video for This Chapter

FilesExpand file tree

ch05

Directory actions

More options

Directory actions

More options

Latest commit

History

ch05

Folders and files

parent directory

README.md

Chapter 5: Pretraining on Unlabeled Data

Main Chapter Code

Bonus Materials

LLM Architectures From Scratch

Code-Along Video for This Chapter