-
bpe-from-scratch-simple.ipynb contains optional (bonus) code that explains and shows how the BPE tokenizer works under the hood; this is geared for simplicity and readability.
-
bpe-from-scratch.ipynb implements a more sophisticated (and much more complicated) BPE tokenizer that behaves similarly as tiktoken with respect to all the edge cases; it also has additional funcitionality for loading the official GPT-2 vocab.
05_bpe-from-scratch
Directory actions
More options
Directory actions
More options
05_bpe-from-scratch
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|
parent directory.. | ||||