Skip to content

A Poor Man's journey with the simplest, fastest repository for training/finetuning medium-sized GPTs.

License

Notifications You must be signed in to change notification settings

grad-ient/PoorMansNanoGPT

 
 

Repository files navigation

PoorMansDeepSeek

License: MIT HF Model

A lightweight GPT implementation with rotary embeddings, MoE layers, and sparse attention, built on karpathy/nanoGPT. Designed for efficiency on low-resource hardware while maintaining competitive performance with GPT-2-124M.

Key Additions

  • ✅ Rotary positional embeddings (RoPE) for better long-context modeling
  • ✅ 4-bit quantized inference support
  • ✅ Mixture-of-Experts (MoE) layer implementation
  • 🚀 40% fewer parameters than GPT-2-124M with similar perplexity

Quick Start

Installation

git clone https://github.com/yourname/PoorMansNanoGPT
cd PoorMansNanoGPT
pip install -r requirements.txt

About

A Poor Man's journey with the simplest, fastest repository for training/finetuning medium-sized GPTs.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%