-
Notifications
You must be signed in to change notification settings - Fork 980
Pull requests: huggingface/tokenizers
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
DOCS: add
add_prefix_space
to processors.ByteLevel
#1878
opened Oct 21, 2025 by
CloseChoice
Loading…
feat: allow BPETrainer to be seeded with a set of initial tokens
#1862
opened Sep 6, 2025 by
henrycharlesworth
Loading…
Fix unsigned integer underflow issue with truncation
#1859
opened Sep 1, 2025 by
maxdebayser
Loading…
Adding multiprocessing for sentencepiece_extractor
#1804
opened Jun 19, 2025 by
AamodThakur
Loading…
Expose
Encoding
attributes via the buffer protocol interface
#1789
opened Jun 4, 2025 by
mariosasko
Loading…
Add benchmark for deserializing large added vocab + optimizations
#1782
opened May 27, 2025 by
ArthurZucker
•
Draft
Pre-tokenizers that support multi-word/non-whitespace BPE in single pass
#1753
opened Mar 22, 2025 by
mjbommar
Loading…
Previous Next
ProTip!
Type g p on any issue or pull request to go back to the pull request listing page.