-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Enabled configurable auto Tensor Parallelism (TP) for the inference of diverse models #6553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
gyou2021
wants to merge
51
commits into
deepspeedai:master
Choose a base branch
from
gyou2021:configurable_autoTP
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
51 commits
Select commit
Hold shift + click to select a range
7c90529
Support pure meta model lm_head tp (#6812)
Yejing-Lai 8c45e8c
Remove op compilation flags due to perf issue (#6944)
NirSonnenschein 049ed93
Pin nv-a6000 workflow (#6938)
loadams dee5ca0
[inf] Add config var to enable keeping module on host (#6846)
oelayan7 251e324
`warn` to `warning` (#6952)
qgallouedec 66b08c7
Add extra_repr to Linear classes for debugging purpose (#6954)
Xia-Weiwen a7e5290
Update import for torchvision.transformers (#6958)
loadams 2090fa2
Remove Duplicate Declaration of pandas in `Dockerfile` (#6959)
Zerohertz 7868339
Enabled configurable autoTP to run out-of-box and remain compatible w…
gyou2021 02303c9
Enabled Qwen2-MoE Tensor Parallism (TP) inference
gyou2021 2bfce64
Enabled configurable auto Tensor Parallelism (TP) for inference of di…
gyou2021 f200d1e
Added input examples and fixed bugs when input is None.
gyou2021 9651150
Added the explanation of DS_REMOVED_COMMON_REDUCE_LINEAR_KEYS
gyou2021 90758a9
Fixed error names
gyou2021 5c6eaa4
Add the missing view operations from sequence parallel(async). (#6750)
inkcherry 9fead9f
Update `torch.norm` to `torch.linalg.norm` and `torch.linalg.vector_n…
loadams 989414c
Using explicit GPU upcast for ZeRO-Offload (#6962)
xylian86 f16f83e
Update version.txt after 0.16.3 release (#6965)
loadams f2b4357
Precisely track nvme optimizer offload (#6963)
tjruwase f48565d
Update build_win.bat script to exclue GDS op as it lacks Windows supp…
loadams e1c5c4d
Add CUDA 12.8 support and comment on CUDA 12.7 (#6975)
loadams 72a8c46
Update torch versions to support 2.6 (#6977)
loadams 826772a
generalize deepspeed linear and implement it for non cuda systems (#6…
oelayan7 0d00669
Update recommended Windows whl building versions (#6983)
loadams d1c8d9d
Title: Fix setup_env_ranks to Properly Set Environment Variables Inst…
fabiosanger 5b38e34
Specify torchvision in nv-ds-chat workflow (prevents errors with torc…
loadams be98cf7
Remove assumption that padding only occurs on last rank (#6974)
xylian86 c8a1664
Use ds-specific module id to avoid conflicts (#6847)
tjruwase 5ebe271
Update A6000 workflows to use newer docker container - 24.09 vs 24.03…
loadams b8ba88b
Allow NVIDIA Blackwell (#6991)
fabiendupont 7b24b47
Update GH org references (#6998)
tjruwase 68d924e
Update CNAME
loadams 1c08a94
Update CNAME
loadams a531625
[XPU] max1100 workflow update for docker and softwares (#7003)
Liangliang-Ma f583d30
autotp training(fix dco) (#7004)
inkcherry a493b22
import triton files when triton is supported and installed (#6989)
oelayan7 8a36b54
Update A6000 tests transformers version (#7016)
loadams 0f8687f
Fix ds-chat CI regression (#7015)
tjruwase b3b7e79
[Ulysses tutorial] typos (#7024)
stas00 4d6b2ab
fix hostname -I for macOS #6497 (#6990)
fitzjalen f7e6f9b
Update workflows to cuda 12.4 (#7000)
loadams 2ce885b
[ROCm] Enable fp_quantizer on ROCm (#7027)
rraminen 40eca62
add gds chinese blog (#7034)
GuanhuaWang 0a05120
Add chinese blog for deepspeed windows, and fix format (#7035)
hwchen2017 dbb3b09
AIO on ROCM (#7023)
jomayeri d32af71
Merge branch 'master' into configurable_autoTP
gyou2021 9bac81a
Merge branch 'master' into configurable_autoTP
delock 9bea3f9
Merge branch 'master' into configurable_autoTP
gyou2021 d2214d0
Merge branch 'master' into configurable_autoTP
loadams 3584c77
Merge branch 'master' into configurable_autoTP
gyou2021 b2adb33
Merge branch 'master' into configurable_autoTP
gyou2021 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.