TinyGPT is a minimal C++11 implementation of GPT-2 inference, built from scratch and mainly inspired by the picoGPT project.
For more details, check out the accompanying blog post: Write a GPT from scratch (TinyGPT)
- Fast BPE tokenizer, inspired by tiktoken.
- CPU and CUDA inference.
- KV cache enabled.
tinygpt::tokenizer
is faster than both HuggingFace Tokenizers and OpenAI tiktoken,the encoding speed was measured using the ~/benches/tokenizer.py script on a machine with an Intel(R) Xeon(R) Platinum 8255C CPU @ 2.50GHz.
git clone --recurse-submodules https://github.com/keith2018/TinyGPT.git
python3 tools/download_gpt2_model.py
if success, you'll see the file model_file.data
in directory assets/gpt2
mkdir build
cmake -B ./build -DCMAKE_BUILD_TYPE=Release
cmake --build ./build --config Release
This will generate the executable file and copy assets to directory app/bin
, then you can run the demo:
cd app/bin
./TinyGPT_demo
[DEBUG] TIMER TinyGPT::Model::loadModelGPT2: cost: 800 ms
[DEBUG] TIMER TinyGPT::Encoder::getEncoder: cost: 191 ms
INPUT:Alan Turing theorized that computers would one day become
GPT:the most powerful machines on the planet.
INPUT:exit
- Tensor
TinyTorch
https://github.com/keith2018/TinyTorch
- JsonParser
RapidJSON
https://github.com/Tencent/rapidjson
- Regex
- HashMap
ankerl::unordered_dense
https://github.com/martinus/unordered_dense
- ConcurrentQueue
moodycamel::ConcurrentQueue
https://github.com/cameron314/concurrentqueue
This code is licensed under the MIT License (see LICENSE).