Skip to content

Pull requests: NVIDIA/TransformerEngine

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

[JAX] WAR for CuDNN MXFP8 norm incorrect result
#1700 opened Apr 18, 2025 by jberchtold-nvidia Loading…
8 of 13 tasks
[JAX] Distributed Current Scaling
#1699 opened Apr 18, 2025 by jberchtold-nvidia Loading…
8 of 13 tasks
Dummy PR to test docs
#1698 opened Apr 18, 2025 by ksivaman Loading…
13 tasks
Cpu reload double buffer
#1695 opened Apr 17, 2025 by sanandaraj5597 Loading…
Added attention offloading
#1691 opened Apr 15, 2025 by sanandaraj5597 Loading…
[PyTorch] Bunch of memory management fixes
#1686 opened Apr 15, 2025 by pggPL Loading…
8 of 13 tasks
[JAX] Add collective GEMM without compute/communication overlap
#1675 opened Apr 11, 2025 by philipphack Loading…
1 of 6 tasks
[QA] Encapsulate functions in test_utils.sh
#1667 opened Apr 10, 2025 by linxiddd Loading…
11 tasks
[JAX] GroupedQuantizer and GroupedScaledTensor
#1666 opened Apr 10, 2025 by phu0ngng Loading…
7 of 13 tasks
[PyTorch] Draft of new weight offloading
#1663 opened Apr 9, 2025 by pggPL Draft
7 of 13 tasks
add view/reshape to blockwise tensor
#1662 opened Apr 9, 2025 by Autumn1998 Draft
13 tasks
rtx5090 arch fix support 2.3.0
#1659 opened Apr 9, 2025 by sudhakarsingh27 Loading…
1 of 13 tasks
[JAX] JAX Current Scaling
#1647 opened Apr 5, 2025 by jberchtold-nvidia Loading…
8 of 13 tasks
Use internal quantizer in Linear module
#1638 opened Apr 3, 2025 by ptrendx Loading…
1 of 13 tasks
Improved performance of mxfp8 cast kernels performance Performance issues
#1628 opened Mar 31, 2025 by Oleg-Goncharov Loading…
5 of 13 tasks
[Pytorch] NVIDIA-DL-Framework-Inspect support – part 2 – features
#1613 opened Mar 25, 2025 by pggPL Loading…
7 tasks done
[Pytorch] NVIDIA-DL-Framework-Inspect support – part 3 – tests
#1612 opened Mar 25, 2025 by pggPL Loading…
7 of 13 tasks
[Pytorch] NVIDIA-DL-Framework-Inspect support – part 4 – documentation
#1611 opened Mar 25, 2025 by pggPL Loading…
7 tasks done
[PyTorch] Tutorial for the ONNX export
#1586 opened Mar 18, 2025 by pggPL Loading…
8 of 13 tasks
[JAX] Unbalanced Context Parallelism with THD format
#1565 opened Mar 12, 2025 by zlsh80826 Loading…
8 of 13 tasks
Split wgrad&dgrad from backward() to support a2a overlap
#1564 opened Mar 12, 2025 by lhb8125 Loading…
1 of 13 tasks
[CI] Add isort
#1563 opened Mar 12, 2025 by yaox12 Loading…
1 of 13 tasks
Enable AttnFuncWithCPAndKVP2P to support mla
#1561 opened Mar 12, 2025 by SuperCB Loading…
3 of 13 tasks
change softmax_lse correction of CP to FP32
#1546 opened Mar 7, 2025 by xrennvidia Loading…
6 of 13 tasks
ProTip! Exclude everything labeled bug with -label:bug.