Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
756 commits
Select commit Hold shift + click to select a range
662a421
Safe usage of popen (#6490)
tjruwase Sep 4, 2024
10ba3dd
Handle an edge case where `CUDA_HOME` is not defined on ROCm systems …
amorehead Sep 4, 2024
c210e60
Update version.txt after 0.15.1 release (#6493)
loadams Sep 5, 2024
857780a
HPU: add required ENV vars to acccelerator init (#6495)
nelyahu Sep 5, 2024
4f80385
Op_builder->is_compatible quite warning (#6093)
terry-for-github Sep 5, 2024
3b09d94
fix pipeline eval_batch micro_batches argument for schedule (#6484)
nelyahu Sep 5, 2024
2a647c5
Fix the broken url link (#6500)
rogerxfeng8 Sep 6, 2024
fc22d96
fix environment variable export bug for MultiNodeRunner (#5878)
TideDra Sep 7, 2024
8fa6b50
Revert "BF16 optimizer: Clear lp grads after updating hp grads in hoo…
nelyahu Sep 9, 2024
c274839
wrap include cuda_bf16.h with ifdef BF16_AVAILABLE (#6520)
oelayan7 Sep 10, 2024
659f6be
Avoid security issues of subprocess shell (#6498)
tjruwase Sep 11, 2024
170b46e
Add conditional on torch version for scaled_dot_product_attention (#6…
loadams Sep 11, 2024
2a56f53
Added Intel Gaudi to Accelerator Setup Guide (#6543)
ShifaAbu Sep 16, 2024
61de017
Skip failing newly added tests in accelerate (#6574)
loadams Sep 25, 2024
7622cd9
Use msgpack for p2p comm (#6547)
tohtana Sep 26, 2024
a540097
DeepNVMe perf tuning (#6560)
tjruwase Sep 26, 2024
0fbe96a
[Accelerator] Cambricon MLU support (#6472)
Andy666G Sep 26, 2024
c85c870
Fix gradient accumulation for Z2+offload (#6550)
tohtana Sep 26, 2024
ba58682
fix errors when setting zero3 leaf modules with torch.compile (#6564)
NirSonnenschein Sep 26, 2024
d45cfd3
[XPU] Support DeepNVMe new code structure (#6532)
Liangliang-Ma Sep 26, 2024
047bcf6
Add APIs to offload states of model, optimizer, and engine (#6011)
tohtana Sep 27, 2024
1caf6e8
add bfloat16 to inference support dtypes (#6528)
nelyahu Sep 27, 2024
d4e1895
[COMPILE] workflow for deepspeed + torch.compile (#6570)
YizhouZ Sep 27, 2024
828ddfb
Fixes on the accelerate side mean we do not need to skip this test (#…
loadams Sep 27, 2024
8cded57
Fix torch include in `op_builder/mlu/fused_adam.py` and update no-tor…
loadams Sep 27, 2024
b93c7a2
[ROCm] Fix subprocess error (#6587)
jagadish-amd Oct 4, 2024
239b83a
Cleanup CODEOWNERS file to be valid (#6603)
loadams Oct 7, 2024
940887d
Add SSF Best practices badge (#6604)
loadams Oct 7, 2024
20695b3
Move V100 workflows from cuda 11.1/11.7 to 12.1 (#6607)
loadams Oct 8, 2024
00c4b98
Fix SD workflow (#6609)
loadams Oct 8, 2024
745dd48
Pin accelerate to fix CI failures/issues (#6610)
loadams Oct 8, 2024
e97b453
Add llama3.2 vision autotp (#6577)
Yejing-Lai Oct 8, 2024
f74ea69
Improve DS logging control (#6602)
tjruwase Oct 8, 2024
5cbbff4
Fix device selection using CUDA_VISIBLE_DEVICES (#6530)
tohtana Oct 8, 2024
ca8b1fe
Handle when `backend` is also in compile_kwargs (#6502)
oraluben Oct 8, 2024
645639b
Rearrange inference OPS and stop using builder.load (#5490)
oelayan7 Oct 9, 2024
1062a0c
Unpin accelerate tests, update lightning with node16 removal. (#6611)
loadams Oct 9, 2024
474a328
Enabled Qwen2-MoE Tensor Parallelism (TP) inference (#6551)
gyou2021 Oct 9, 2024
55f7f37
Update version.txt after 0.15.2 release (#6615)
loadams Oct 9, 2024
7d751ee
Clean up prefetched parameters (#6557)
tohtana Oct 9, 2024
a1f98bd
AIO CPU Locked Tensor (#6592)
jomayeri Oct 9, 2024
d7ca3d8
reduce setting global variables to reduce torch compile graph breaks …
NirSonnenschein Oct 10, 2024
adec991
Add API to get devices of offload states (#6586)
tohtana Oct 10, 2024
5c4b97f
apply fp16 autocast only to floating point values
Oct 11, 2024
7a5bc4f
Ignore reuse_dist_env (#6623)
tohtana Oct 14, 2024
cf41e8c
[compile] Show breakdown of graph break (#6601)
delock Oct 14, 2024
65ab644
Add API for updating ZeRO gradients (#6590)
tjruwase Oct 14, 2024
13c16c9
Accept btl_tcp_if_include option through launcher_args (#6613)
diskkid Oct 14, 2024
85b7469
Add first Step in LR Schedulers (#6597)
jomayeri Oct 14, 2024
bf60fc0
Support safetensors export (#6579)
xu-song Oct 15, 2024
ce468c3
add option to disable logger while compiling to avoid graph breaks (#…
ShellyNR Oct 15, 2024
1a45bd8
Lock cache file of HF model list (#6628)
tohtana Oct 15, 2024
c9899dc
Add README Pipeline Status for Huawei Ascend NPU (#6588)
xuedinge233 Oct 15, 2024
a36db9c
Update torch version in workflows (#6631)
tohtana Oct 17, 2024
c9fc34a
Use file store for tests (#6632)
tohtana Oct 17, 2024
6eefc3d
Fix Memory Leak In AIO (#6630)
jomayeri Oct 18, 2024
40bde52
[XPU] upgrade xpu max1100 CI workflow to pytorch2.3 (#6646)
Liangliang-Ma Oct 21, 2024
11bbf45
[XPU] host timer check version from Torch 2.5 to Torch 2.6 (#6633)
YizhouZ Oct 22, 2024
a24cdd6
[XPU] [DeepNVMe] use same cpu_op_desc_t with cuda (#6645)
Liangliang-Ma Oct 22, 2024
bf03f48
Update version.txt after 0.15.3 release (#6652)
loadams Oct 22, 2024
b647fb2
Fix expert grad scaling problem with ZeRO optimizer (#6546)
wyooyw Oct 23, 2024
e06bb51
Add attribute check for language_model when replace last linear modul…
Yejing-Lai Oct 23, 2024
6e6563d
fix init_device_mesh for torch 2.4 (#6614)
Lzhang-hub Oct 23, 2024
3d5cf73
Fix dynamo issue (#6527)
oraluben Oct 25, 2024
5fb71c0
sequence parallel for uneven heads (#6392)
inkcherry Oct 25, 2024
24285d6
Add fallback for is_compiling (#6663)
tohtana Oct 25, 2024
54903e0
Update profiler registration check (#6668)
loadams Oct 25, 2024
229960a
Add support for H100/sm_90 arch compilation (#6669)
loadams Oct 28, 2024
b3e9594
Update Gaudi2 docker image (#6677)
loadams Oct 28, 2024
e6357c2
Update gaudi2 docker version to latest release (1.18) (#6648)
raza-sikander Oct 28, 2024
0e11b08
Update base docker image for A6000 GPU tests (#6681)
loadams Oct 28, 2024
07cac9e
Remove packages that no longer need to be updated in the latest conta…
loadams Oct 29, 2024
e4a247e
Fix training of pipeline based peft's lora model (#5477)
xuanhua Oct 29, 2024
9b54731
Update checkout action to latest version (#5021)
loadams Oct 30, 2024
c7f58c8
Add attribute check to support git-base autotp (#6688)
Yejing-Lai Oct 31, 2024
ff1c543
fix memcpy issue on backward for zero-infinity (#6670)
xylian86 Oct 31, 2024
95ea95f
Free memory in universal checkpointing tests (#6693)
tohtana Oct 31, 2024
b24dfa9
Explictly set device when reusing dist env (#6696)
tohtana Nov 1, 2024
9068acb
Update URL in README Pipeline Status for Huawei Ascend NPU (#6706)
xuedinge233 Nov 4, 2024
6c08b7f
Pin transformers to 4.45.2 in nv-ds-chat workflow (#6710)
loadams Nov 4, 2024
2b41d62
[Bug Fix] Support threads_per_head < 64 for wavefront size of 64 (#6622)
jagadish-amd Nov 4, 2024
351569d
Use one param coordinator for both train/inference scenarios (#6662)
tohtana Nov 5, 2024
d2a4718
Update yapf version (#6721)
loadams Nov 6, 2024
3beda32
Update flake8 version (#6722)
loadams Nov 6, 2024
a1b0c35
Switch what versions of python are supported (#5676)
loadams Nov 7, 2024
057d25b
Update version.txt after 0.15.4 release (#6731)
loadams Nov 8, 2024
0855566
Update GH hosted workflows to 24.04 (#6717)
loadams Nov 11, 2024
b7e2ff5
Add COMMITTER file (#6741)
tjruwase Nov 11, 2024
b45ca26
Update AMD apex version (#6739)
loadams Nov 11, 2024
99e9cbe
Fix Type Name Inconsistency & Typo in cpu_adam (#6732)
xylian86 Nov 11, 2024
fabab19
Add Domino code (#6733)
zhangsmallshark Nov 11, 2024
73d974e
Add data type check for bf16 (#6742)
hwchen2017 Nov 12, 2024
7af3a4b
add zero3 ```module_granularity_threshold ``` to zero optimization. (…
inkcherry Nov 12, 2024
b692cde
AIO File Offsets (#6641)
jomayeri Nov 12, 2024
877aa0d
Update path for BingBertSquad from DeepSpeedExamples (#6746)
loadams Nov 12, 2024
9a2c209
Sanitize inputs to eval() (#6745)
loadams Nov 13, 2024
d702eb5
Adding the governance doc (#6748)
minjiazhang Nov 14, 2024
fc4e733
Add no_sync context manager (#6675)
tjruwase Nov 14, 2024
e3b5a4b
Gaudi2 Nightly job for daily check (#6753)
raza-sikander Nov 15, 2024
f594dbe
Disable failing python tests (#6758)
loadams Nov 18, 2024
dd40269
A faster and more memory-efficient implementation of `zero_to_fp32` (…
xu-song Nov 18, 2024
8488bee
Pin transformers version to work around latest torch requirements (#6…
loadams Nov 19, 2024
1fdad1f
make xpu ops compatible with oneapi 2025.0 (#6760)
baodii Nov 19, 2024
2e0c39b
Add explicit parameters for torch.load (#6751)
loadams Nov 19, 2024
065398d
Fix setup.py bash cmd generation to correctly extract git info (#6762)
nelyahu Nov 19, 2024
83e4364
Use `json_schema_extra` instead of extra keyword in `Field` (#6764)
qgallouedec Nov 20, 2024
b5709cc
Enable torch compile on _allgather_params (#6769)
deepcharm Nov 21, 2024
f515104
Removes unnecessary cloning (#6761)
swigls Nov 21, 2024
cd20a3b
Fix potential memory issues when use deepspeed Z3 (#6726)
wenbinc-Bin Nov 21, 2024
f57b1ef
Unpin with latest transformers fixes (#6763)
loadams Nov 22, 2024
5e16f25
docs: fix HF links (#6780)
imba-tjd Nov 25, 2024
d6410f9
Fix Doc Error: ZeRO Stage 2 gradient partitioning (#6775)
yewentao256 Nov 25, 2024
fabcf40
Cleanup code docs warnings (#6783)
loadams Nov 25, 2024
ec6cc49
Domino Blog (#6776)
GuanhuaWang Nov 25, 2024
03845db
Update version.txt before release (#6784)
loadams Nov 25, 2024
e5570b1
Revert release workflow (#6785)
loadams Nov 25, 2024
f743fec
Update version.txt after 0.16.0 release (#6786)
loadams Nov 25, 2024
0c6c981
Domino news update on readme.md (#6815)
GuanhuaWang Dec 3, 2024
fc23007
Fix zero checkpoint (#6792)
xu-song Dec 4, 2024
ed7d183
Update python version but now we need to include setuptools on our ow…
loadams Dec 4, 2024
60a1b57
Adding the new feature of FPDT (#6462)
YJHMITWEB Dec 4, 2024
b966e1f
Pin transformers to avoid errors with latest version (#6820)
loadams Dec 5, 2024
0b0fef3
Ulyssess offload blog (#6814)
samadejacobs Dec 5, 2024
7b9fc8c
add FPDT tutorial (#6813)
samadejacobs Dec 5, 2024
0e92f9b
Update README.md (#6824)
samadejacobs Dec 5, 2024
2ea181f
Update README.md (#6825)
samadejacobs Dec 5, 2024
95ead2a
Pin transformers version in cpu-torch-latest due to multiprocessing e…
loadams Dec 5, 2024
177832e
Update pre-commit version (#6821)
loadams Dec 5, 2024
a449966
Update version.txt after 0.16.1 release (#6826)
loadams Dec 5, 2024
9ca6016
Pin HPU tests (#6831)
loadams Dec 6, 2024
9a41cca
Flops profiler support einops.einsum (#6755)
lvhoaa Dec 9, 2024
08b907a
Pin pytest-subtests version for accelerate tests (#6842)
loadams Dec 9, 2024
0c92c39
Inference UTs check for trition support from accelerator (#6782)
raza-sikander Dec 10, 2024
06f1d36
Unpin pytest-subtests now that 0.14.1 is released (#6844)
loadams Dec 10, 2024
1b58ba5
Merge LoCo with Zero++ (#6730)
XingyuXie Dec 10, 2024
9e31252
Fix type error in `ZeROOrderedDict` (#6794)
oraluben Dec 10, 2024
ecb4bf3
Fix uneven head sequence parallelism bug (#6774) (#6797)
Eugene29 Dec 10, 2024
074d5c6
Fix nv-torch-nightly test by pinning transformers (#6849)
loadams Dec 11, 2024
bd6fd50
Remove broken links to non-active site (#6854)
kaiksi-bb Dec 12, 2024
9182947
Avoid poisoning process with CUDA calls as soon as importing (#6810)
HollowMan6 Dec 12, 2024
853a976
Fix xpu tests workflow failure by changing pip index url (#6864)
Liangliang-Ma Dec 13, 2024
d7750c3
Domino updates (#6861)
GuanhuaWang Dec 13, 2024
b5e3fac
add domino navigation (#6866)
GuanhuaWang Dec 13, 2024
8efbcc4
Update TSC (#6867)
tjruwase Dec 13, 2024
6e3e13c
Remove warnings from autodoc and sphinx (#6788)
loadams Dec 13, 2024
fc7c070
Update real_accelerator.py (#6845)
keiwoo Dec 14, 2024
db98cc3
Fix assertion for offloading states (#6855)
tohtana Dec 16, 2024
87c6506
Remove pin from transformers version and fix Processing/Threading iss…
loadams Dec 16, 2024
da771ed
Add MLP/lm_head tp grain size setting. (#6828)
Yejing-Lai Dec 16, 2024
a964e43
Fix --enable_each_rank_log when used with PDSH multi-node runner (#6863)
akeshet Dec 17, 2024
2f32966
Update transformers ops unit tests to use `requried_torch_version` (#…
loadams Dec 17, 2024
4cd1d97
Don't error out when cpu accelerator doesn't have torch (as default f…
loadams Dec 18, 2024
0b25630
Add arctic model support by adding w2 to all_reduce (#6856)
pi314ever Dec 18, 2024
b344c04
Update code owners (#6890)
tjruwase Dec 18, 2024
f9e158a
Update version.txt after 0.16.2 release (#6893)
loadams Dec 18, 2024
4fd7920
Allow to compile collective for PT>2.3 (#6899)
NirSonnenschein Dec 19, 2024
00ea0c4
Zero2: avoid graph breaks in torch.compile by using param_idx (#6803)
nelyahu Dec 20, 2024
eea5304
hpu_accelerator: use torch.use_deterministic_algorithms (#6897)
nelyahu Dec 20, 2024
85cc5f9
Fix error caused by all_reduce call in domino (#6880)
hwchen2017 Dec 26, 2024
cc03c76
Update Gaudi2 jobs to latest 1.19 build (#6905)
raza-sikander Dec 26, 2024
3573858
Change compile for pipeline module torch.compile (#6478)
NirSonnenschein Dec 30, 2024
456c9ac
Stage3: Use new torch grad accumulation hooks API (#6773)
deepcharm Jan 3, 2025
a8ede3a
Cleanup ops/transformer/inference tests (#6830)
loadams Jan 3, 2025
0dbbb70
Fix `checkpointable_layers` Logic (#6881)
Quentin-Anthony Jan 4, 2025
f8c9f31
[BUG FIX]:fix get torch.version.cuda error when cuda is None in rocm …
kairos-yu Jan 6, 2025
c5e48f4
Add fp8_gemm fallback for non-triton systems (#6916)
oelayan7 Jan 6, 2025
b0040b6
Reduce the device bubble introduced by heavy loop synchronization in …
inkcherry Jan 6, 2025
c348c5b
Cleanup ops/transformer/inference tests (#6925)
loadams Jan 6, 2025
f2cc809
Check transformers version in BLOOM for inference v1 (#6766)
lekurile Jan 7, 2025
c7f3032
inference: remove unused _validate_args function (#5505)
nelyahu Jan 7, 2025
c41b0c2
Use `torch.log1p` (#6930)
kit1980 Jan 8, 2025
6628127
Update python version classifiers (#6933)
loadams Jan 8, 2025
b62c84d
Fix building on Windows with presence of Triton (#6749)
woct0rdho Jan 8, 2025
53fb579
Fix windows blog examples (#6934)
loadams Jan 8, 2025
45fce45
Add deepseek autotp (#6937)
Yejing-Lai Jan 9, 2025
0fc3daa
Add position_ids arg to OPTEmbedding forward function (#6939)
lekurile Jan 9, 2025
1d15ef0
Add information on security expectations with this software (#6941)
loadams Jan 9, 2025
fa8db5c
Support pure meta model lm_head tp (#6812)
Yejing-Lai Jan 10, 2025
396f8db
Remove op compilation flags due to perf issue (#6944)
NirSonnenschein Jan 13, 2025
66d3d3e
Pin nv-a6000 workflow (#6938)
loadams Jan 13, 2025
fae714d
[inf] Add config var to enable keeping module on host (#6846)
oelayan7 Jan 15, 2025
05eaf3d
`warn` to `warning` (#6952)
qgallouedec Jan 15, 2025
018ece5
Add extra_repr to Linear classes for debugging purpose (#6954)
Xia-Weiwen Jan 16, 2025
f97f088
Update import for torchvision.transformers (#6958)
loadams Jan 17, 2025
7f3d669
Remove Duplicate Declaration of pandas in `Dockerfile` (#6959)
Zerohertz Jan 17, 2025
bc76b04
Add the missing view operations from sequence parallel(async). (#6750)
inkcherry Jan 21, 2025
8d1bc0a
Update `torch.norm` to `torch.linalg.norm` and `torch.linalg.vector_n…
loadams Jan 21, 2025
c17dc33
Using explicit GPU upcast for ZeRO-Offload (#6962)
xylian86 Jan 21, 2025
de4596b
Update version.txt after 0.16.3 release (#6965)
loadams Jan 21, 2025
470dd6d
Precisely track nvme optimizer offload (#6963)
tjruwase Jan 23, 2025
1640f6d
Update build_win.bat script to exclue GDS op as it lacks Windows supp…
loadams Jan 24, 2025
46c6c9e
Add CUDA 12.8 support and comment on CUDA 12.7 (#6975)
loadams Jan 28, 2025
8ad4872
Update torch versions to support 2.6 (#6977)
loadams Jan 29, 2025
593de92
generalize deepspeed linear and implement it for non cuda systems (#6…
oelayan7 Jan 29, 2025
8bb4d44
Update recommended Windows whl building versions (#6983)
loadams Jan 30, 2025
065ca8a
Title: Fix setup_env_ranks to Properly Set Environment Variables Inst…
fabiosanger Jan 30, 2025
c963c21
Specify torchvision in nv-ds-chat workflow (prevents errors with torc…
loadams Jan 30, 2025
4fea41f
Remove assumption that padding only occurs on last rank (#6974)
xylian86 Jan 31, 2025
029e0a3
Use ds-specific module id to avoid conflicts (#6847)
tjruwase Jan 31, 2025
241bffd
Update A6000 workflows to use newer docker container - 24.09 vs 24.03…
loadams Jan 31, 2025
f4caed6
Allow NVIDIA Blackwell (#6991)
fabiendupont Feb 4, 2025
fd40516
Update GH org references (#6998)
tjruwase Feb 5, 2025
a1df4b4
Update CNAME
loadams Feb 5, 2025
bee641d
Update CNAME
loadams Feb 5, 2025
e7fc598
[XPU] max1100 workflow update for docker and softwares (#7003)
Liangliang-Ma Feb 5, 2025
f04649d
autotp training(fix dco) (#7004)
inkcherry Feb 5, 2025
301bc28
import triton files when triton is supported and installed (#6989)
oelayan7 Feb 6, 2025
a83ab17
Update A6000 tests transformers version (#7016)
loadams Feb 8, 2025
22d7fdc
Fix ds-chat CI regression (#7015)
tjruwase Feb 10, 2025
a5b6395
[Ulysses tutorial] typos (#7024)
stas00 Feb 11, 2025
549e11d
fix hostname -I for macOS #6497 (#6990)
fitzjalen Feb 12, 2025
079de6b
Update workflows to cuda 12.4 (#7000)
loadams Feb 12, 2025
5a361e1
[ROCm] Enable fp_quantizer on ROCm (#7027)
rraminen Feb 13, 2025
83f5dee
add gds chinese blog (#7034)
GuanhuaWang Feb 13, 2025
e637677
Add chinese blog for deepspeed windows, and fix format (#7035)
hwchen2017 Feb 14, 2025
14b3cce
AIO on ROCM (#7023)
jomayeri Feb 14, 2025
ee3f19b
Control trace cache warnings (#7039)
tjruwase Feb 18, 2025
2735cf4
Update CUDA compute capability to support Blackwell (#7047)
hwchen2017 Feb 18, 2025
7288e61
Update setup.py handling of ROCm cupy (#7051)
loadams Feb 19, 2025
33dd2e2
nv-ds-chat breaks with latest transformers (#7052)
loadams Feb 19, 2025
c9da489
Rename aio_thread_count to intra_op_parallelism (#7056)
tjruwase Feb 19, 2025
d98204b
add autoTP training zero2 tests (#7049)
inkcherry Feb 19, 2025
e2dc3ee
Fix, bf16 optimizer remove dup loop (#7054)
wukong1992 Feb 20, 2025
fa8967e
Update version.txt after 0.16.4 release (#7063)
loadams Feb 20, 2025
461d641
fix an outdated doc wrt CUDA_VISIBLE_DEVICES (#7058)
stas00 Feb 20, 2025
cb20d44
Tecorigin sdaa accelerator (#6903)
siqi654321 Feb 20, 2025
8577bd2
Handle special case of libuv for Windows (#7064)
loadams Feb 20, 2025
9f20148
Update README with info on newest accelerator (#7065)
loadams Feb 21, 2025
38327e0
Bug Fix for offload_states API (#7050)
U-rara Feb 21, 2025
9d820e4
Fix TOCTOU issues, switch to fstat (#7067)
loadams Feb 24, 2025
e1903f0
config torch to avoid graph breaks caused by logger (#6999)
ShellyNR Feb 24, 2025
4b7e2c9
Fix meta load tensor imcompatible issue (#7073)
Yejing-Lai Feb 24, 2025
1d30b58
Replace calls to `python setup.py sdist` with `python -m build --sdis…
loadams Feb 24, 2025
729dfaf
Revert "Handle special case of libuv for Windows (#7064)" (#7076)
loadams Feb 25, 2025
f0401ad
Add DeepseekV3 AutoTP. (#7045)
Yejing-Lai Feb 26, 2025
c07b635
Improve inference tutorial docs (#7083)
loadams Feb 26, 2025
f8d3429
Pin transformers version on tests that use latest. (#7085)
loadams Feb 27, 2025
5320d4c
Update README.md with ICS '23 MoE paper link (#7087)
siddharth9820 Feb 27, 2025
f2ed253
Update parallelism for nv-torch-latest/nightly tests due to more GPUs…
loadams Feb 27, 2025
02bbf50
Remove workflows for very old torch versions (#7090)
loadams Feb 28, 2025
b4177e4
Use new dlpack api; Formatting fixes (#7101)
tjruwase Mar 3, 2025
a88f56a
Avoid graph breaks by disabling sourceless calls in instrument_w_nvtx…
deepcharm Mar 3, 2025
776822f
Avoid graph breaks in torch.compile caused by inner classes in the ba…
deepcharm Mar 4, 2025
e4c7931
Only run pre-commit on the changes (#7106)
hwchen2017 Mar 4, 2025
17c6595
Avoid graph break due to unsupported frozenset (#7105)
deepcharm Mar 4, 2025
71807bc
Fix fused_qkv print model ValueError (#7109)
Yejing-Lai Mar 4, 2025
c2c8199
Update references to new X/Twitter handle (#7110)
loadams Mar 4, 2025
7694346
Merge branch 'master' of https://github.com/deepspeedai/DeepSpeed int…
Quentin-Anthony Mar 6, 2025
15cd540
Merge branch 'deepspeedai-master' into stage_upstream
Quentin-Anthony Mar 6, 2025
c3b558b
revert steps_per_print_default change
Quentin-Anthony Mar 6, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/deepspeed_chat_bug_report.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ If applicable, add screenshots to help explain your problem.
**System info (please complete the following information):**
- OS: [e.g. Ubuntu 18.04]
- GPU count and types [e.g. two machines with x8 A100s each]
- (if applicable) what [DeepSpeed-MII](https://github.com/microsoft/deepspeed-mii) version are you using
- (if applicable) what [DeepSpeed-MII](https://github.com/deepspeedai/deepspeed-mii) version are you using
- (if applicable) Hugging Face Transformers/Accelerate/etc. versions
- Python version
- Any other relevant info about your setup
Expand Down
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/inference_bug_report.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ If applicable, add screenshots to help explain your problem.
**System info (please complete the following information):**
- OS: [e.g. Ubuntu 18.04]
- GPU count and types [e.g. two machines with x8 A100s each]
- (if applicable) what [DeepSpeed-MII](https://github.com/microsoft/deepspeed-mii) version are you using
- (if applicable) what [DeepSpeed-MII](https://github.com/deepspeedai/deepspeed-mii) version are you using
- (if applicable) Hugging Face Transformers/Accelerate/etc. versions
- Python version
- Any other relevant info about your setup
Expand Down
56 changes: 0 additions & 56 deletions .github/workflows/amd-mi100.yml

This file was deleted.

12 changes: 7 additions & 5 deletions .github/workflows/amd-mi200.yml
Original file line number Diff line number Diff line change
@@ -1,9 +1,13 @@
name: amd-mi200

on:
workflow_dispatch:
pull_request:
paths:
- '.github/workflows/amd-mi200.yml'
- 'requirements/**'
schedule:
- cron: "0 0 * * *"
workflow_dispatch:

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
Expand All @@ -21,14 +25,14 @@ jobs:
# Steps represent a sequence of tasks that will be executed as part of the job
steps:
# Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it
- uses: actions/checkout@v3
- uses: actions/checkout@v4

- id: setup-venv
uses: ./.github/workflows/setup-venv

- name: Install pytorch
run: |
pip install -U --cache-dir $TORCH_CACHE torch torchvision --index-url https://download.pytorch.org/whl/rocm5.6
pip install -U --cache-dir $TORCH_CACHE torch torchvision --index-url https://download.pytorch.org/whl/rocm6.0
python -c "import torch; print('torch:', torch.__version__, torch)"
python -c "import torch; print('CUDA available:', torch.cuda.is_available())"

Expand All @@ -44,8 +48,6 @@ jobs:
- name: Install (ROCm) apex
run: |
git clone https://github.com/ROCmSoftwarePlatform/apex.git
cd apex
git checkout torch_2.1_higher
CURRENT_VER=$(git rev-parse HEAD)
INSTALLED_VER=$(cat /blob/amd-apex/.venv_installed_version)
if [[ "$CURRENT_VER" != "$INSTALLED_VER" ]]; then
Expand Down
59 changes: 0 additions & 59 deletions .github/workflows/auto-sync.yml

This file was deleted.

68 changes: 48 additions & 20 deletions .github/workflows/cpu-inference.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,52 +2,73 @@ name: cpu-inference

on:
workflow_dispatch:
pull_request:
paths:
- '.github/workflows/cpu-inference.yml'
- 'requirements/**'
- 'deepspeed/__init__.py'
- 'deepspeed/inference/**'
- '!deepspeed/inference/v2/**' # exclude v2 dir
- 'tests/unit/inference/**'
- '!tests/unit/inference/v2/**' # exclude v2 tests dir
merge_group:
branches: [ master ]
schedule:
- cron: "0 0 * * 0"

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

jobs:
unit-tests:
runs-on: ubuntu-20.04
runs-on: [self-hosted, cpu]

env: {ACTIONS_ALLOW_USE_UNSECURE_NODE_VERSION: true} # Allow using Node16 actions

steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4

- id: setup-venv
uses: ./.github/workflows/setup-venv

- name: Install gcc-9
run: |
sudo add-apt-repository -u ppa:ubuntu-toolchain-r/test
sudo apt install -y gcc-9 g++-9
# set gcc-9 and g++9 to default
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-9 99
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-9 99

- name: Check gcc version
run: |
# Get gcc version
gcc --version
g++ --version

- name: Detect instruction sets on instance
run: |
lscpu
pip install cmake
git clone https://github.com/intel/intel-extension-for-pytorch
cd intel-extension-for-pytorch/tests/cpu/isa
cmake .
make
./cpu_features

- name: Install numactl
run: |
sudo apt-get install -y numactl

- name: Install oneCCL Bindings for PyTorch
- name: Install dependencies
run: |
python -m pip install intel_extension_for_pytorch
python -m pip install oneccl_bind_pt==2.0 -f https://developer.intel.com/ipex-whl-stable-cpu
pip install torch
# check installed version
pip list |grep \\\<torch\\\>

- name: Install oneCCL
run: |
pip install cmake
git clone https://github.com/oneapi-src/oneCCL
cd oneCCL
mkdir build
cd build
cmake ..
make
make install
#source ./_install/env/setvars.sh
# test whether oneCCL is correctly installed
#mpirun -n 2 ./examples/benchmark/benchmark
make -j install

- name: Install transformers
run: |
Expand All @@ -62,14 +83,21 @@ jobs:
pip install .[dev,1bit,autotuning,inf]
ds_report

- name: Python environment
- name: Python environment check
run: |
pip list
source oneCCL/build/_install/env/setvars.sh
export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libstdc++.so.6
# check whether the environment is properly setup
python -c "import deepspeed;from deepspeed.accelerator import get_accelerator;print(get_accelerator().device_name());print(get_accelerator().is_available())"

- name: Unit tests
run: |
# prep oneCCL for CCLBackend comm ops building
source oneCCL/build/_install/env/setvars.sh
export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libstdc++.so.6
unset TORCH_CUDA_ARCH_LIST # only jit compile for current arch
cd tests
TRANSFORMERS_CACHE=~/tmp/transformers_cache/ TORCH_EXTENSIONS_DIR=./torch-extensions pytest -m 'seq_inference' unit/
TRANSFORMERS_CACHE=~/tmp/transformers_cache/ TORCH_EXTENSIONS_DIR=./torch-extensions pytest -m 'inference_ops' -m 'inference' unit/
cd tests
# LOCAL_SIZE=2 enforce CPU to report 2 devices, this helps run the test on github default runner
LOCAL_SIZE=2 COLUMNS=240 HF_HOME=~/tmp/hf_home/ TORCH_EXTENSIONS_DIR=./torch-extensions pytest -m 'seq_inference' unit/
LOCAL_SIZE=2 COLUMNS=240 HF_HOME=~/tmp/hf_home/ TORCH_EXTENSIONS_DIR=./torch-extensions pytest -m 'inference_ops' -m 'inference' unit/
Original file line number Diff line number Diff line change
@@ -1,31 +1,39 @@
name: nv-torch110-p40
name: cpu-torch-latest

on:
workflow_dispatch:
pull_request:
paths-ignore:
- 'docs/**'
- 'blogs/**'
- 'deepspeed/inference/v2/**'
- 'tests/unit/inference/v2/**'
merge_group:
branches: [ master ]
schedule:
- cron: "0 0 * * *"
workflow_dispatch:

concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true

permissions:
contents: read
issues: write

jobs:
unit-tests:
runs-on: [self-hosted, nvidia, cu111, p40]
runs-on: ubuntu-24.04

steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4

- id: setup-venv
uses: ./.github/workflows/setup-venv

- name: Install system packages
run: |
sudo apt-get install -y numactl pdsh

- name: Install pytorch
run: |
pip install -U --cache-dir $TORCH_CACHE torch==1.10.0+cu111 torchvision==0.11.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html
pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
python -c "import torch; print('torch:', torch.__version__, torch)"
python -c "import torch; print('CUDA available:', torch.cuda.is_available())"

Expand All @@ -34,13 +42,13 @@ jobs:
git clone https://github.com/huggingface/transformers
cd transformers
# if needed switch to the last known good SHA until transformers@master is fixed
# git checkout 1cc453d33
git checkout 981c276
git rev-parse --short HEAD
pip install .

- name: Install deepspeed
run: |
pip install .[dev,1bit,autotuning] --no-build-isolation
pip install .[dev,autotuning]
ds_report

- name: Python environment
Expand All @@ -51,13 +59,5 @@ jobs:
run: |
unset TORCH_CUDA_ARCH_LIST # only jit compile for current arch
cd tests
pytest $PYTEST_OPTS --forked -n 4 unit/ --torch_ver="1.10" --cuda_ver="11.1"

- name: Open GitHub issue if nightly CI fails
if: ${{ failure() && (github.event_name == 'schedule') }}
uses: JasonEtco/create-an-issue@v2
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
filename: .github/ISSUE_TEMPLATE/ci_failure_report.md
update_existing: true
HF_HOME=/tmp/hf_home/ pytest $PYTEST_OPTS -n 4 unit/ --torch_ver="2.6"
HF_HOME=/tmp/hf_home/ pytest $PYTEST_OPTS -m 'sequential' unit/ --torch_ver="2.6"
7 changes: 4 additions & 3 deletions .github/workflows/formatting.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
name: Formatting

on:
workflow_dispatch:
pull_request:
branches:
'**'
Expand All @@ -16,11 +17,11 @@ concurrency:
jobs:

# formatting and basic install on cpu-only machine
formatting:
runs-on: ubuntu-20.04
unit-tests:
runs-on: ubuntu-22.04

steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4

- name: environment
run: |
Expand Down
Loading
Loading