Skip to content

Pull requests: huggingface/nanotron

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

[WIP] Distillation
#290 opened Mar 6, 2025 by Stillerman Loading…
2 of 14 tasks
Fix unpacking issue caused by newer Flash Attention
#289 opened Mar 5, 2025 by Stillerman Loading…
3 of 6 tasks
Recommend the use of Spack on supercomputers
#282 opened Feb 19, 2025 by thomas-bouvier Loading…
Add MLA
#278 opened Feb 5, 2025 by zzhhjjj Loading…
Add nanotron performance
#274 opened Jan 23, 2025 by xrsrke Loading…
fp8
#266 opened Dec 18, 2024 by xrsrke Loading…
Fix wrong initialization of lr scheduler
#256 opened Nov 29, 2024 by kylematoba Loading…
[NEW] Llama3.2 weight converters 🦙
#255 opened Nov 28, 2024 by TJ-Solergibert Loading…
6 tasks
Fix initial_lr when resuming training
#243 opened Nov 17, 2024 by Lauler Loading…
Load random states from checkpoint
#238 opened Nov 2, 2024 by gritukan Loading…
lighteval support after checkpoint, UX refactor
#222 opened Aug 24, 2024 by eliebak Loading…
Refactor pre tokenization tool
#219 opened Aug 21, 2024 by eliebak Loading…
Created interconnect benchmark before the training
#200 opened Jun 22, 2024 by RamenBuddha Loading…
Move MoE Implementation into src/, add Load Balancing Losses
#192 opened Jun 6, 2024 by haeggee Loading…
1 task done
[Feature] Monitor model states during training
#183 opened May 25, 2024 by xrsrke Loading…
Fix overflow in nanosets with big datasets
#182 opened May 23, 2024 by jquesnelle Loading…
ProTip! Find all pull requests that aren't related to any open issues with -linked:issue.