Skip to content

Train a language model to chat like you using your personal conversations from WhatsApp, Telegram, Signal, or other platforms.

Notifications You must be signed in to change notification settings

ImadSaddik/Train_Your_Language_Model_Course

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Train your language model course

We’ve all used Large Language Models (LLMs) and been amazed by what they can do. I wanted to understand how these models are built, so I created this course.

I’m from Morocco and speak Moroccan Darija. Most LLMs today understand it a little, but they can't hold proper conversations in Darija. So, as a challenge, I decided to train a language model from scratch using my own WhatsApp conversations in Darija.

I've made a YouTube playlist documenting every step. You can watch it at your own pace. If anything is unclear, feel free to open an issue in this repository. I’ll be happy to help!

course_thumbnail

What is in this repository?

  • notebooks/: Jupyter notebooks for each step in the pipeline.
  • slides/: Presentation slides used in the YouTube series.
  • data/: Sample data and templates.
  • transformer/: Scripts for the Transformer and LoRA implementations.
  • minbpe/: A tokenizer from Andrej' Karpathy's repo, since it's not available as a package.

Setup

To get started, install Python and the required dependencies by running:

pip install -r requirements.txt

What you will learn?

This course covers:

  1. Extracting data from WhatsApp.
  2. Tokenizing text using the BPE algorithm.
  3. Understanding Transformer models.
  4. Pre-training the model.
  5. Creating a fine-tuning dataset.
  6. Fine-tuning the model (Instruction tuning and LoRA fine-tuning).

Each topic has a video in the YouTube playlist and a Jupyter notebook in the notebooks/ folder.

Contributions

We welcome contributions! If you find any issues or have suggestions for improvements, please open an issue or submit a pull request.

Need help?

You can reach me through:

About

Train a language model to chat like you using your personal conversations from WhatsApp, Telegram, Signal, or other platforms.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published