Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update workflows to cuda 12.4 #7000

Merged
merged 16 commits into from
Feb 12, 2025
Merged

Update workflows to cuda 12.4 #7000

merged 16 commits into from
Feb 12, 2025

Conversation

loadams
Copy link
Collaborator

@loadams loadams commented Feb 4, 2025

  • Update existing workflows that use cu121 to cu124. Note, this means that where we download torch latest, we will now be getting torch 2.6 rather than the torch latest 2.5 provided with cuda 12.1.
  • Note, nv-nightly is failing in master currently due to unrelated errors, so this could be ignored in this PR (nv-nightly tested locally, where it passes with 12.1 and it also passes with 12.4).

fabiendupont and others added 8 commits February 7, 2025 14:56
NVIDIA Blackwell GPU generation has number 10. The SM code and
architecture should be `100`, but the current code generates `1.`,
because it expects a 2 characters string.

This change modifies the logic to consider it as a string that contains
a `.`, hence splits the string and uses the array of strings.

Signed-off-by: Fabien Dupont <[email protected]>
Signed-off-by: Logan Adams <[email protected]>
Signed-off-by: Olatunji Ruwase <[email protected]>
Signed-off-by: Logan Adams <[email protected]>
Signed-off-by: Fabien Dupont <[email protected]>
Co-authored-by: Fabien Dupont <[email protected]>
Signed-off-by: Logan Adams <[email protected]>
Signed-off-by: Logan Adams <[email protected]>
Signed-off-by: Logan Adams <[email protected]>
1. update intel oneAPI basekit to 2025.0
2. update torch/ipex/oneccl to 2.5

Signed-off-by: Logan Adams <[email protected]>
Same as [this PR](#6922).
[affeb88](affeb88)
I noticed the CI updated the DCO check recently. Using the suggested
rebase method for sign-off would reintroduce many conflicts, so I opted
for a squash merge with sign-off instead. thanks: )

Signed-off-by: inkcherry <[email protected]>
Signed-off-by: Logan Adams <[email protected]>
Those files have code that gets run when importing them, so in systems
that doesn't support triton but have triton installed this causes
issues.

In general, I think it is better to import triton when it is installed
and supported.

Signed-off-by: Omar Elayan <[email protected]>
Signed-off-by: Logan Adams <[email protected]>
@loadams loadams changed the title Update workflows that use cuda 12.1 to use runners with 12.4 [Test] Update workflows that use cuda 12.1 to use runners with 12.4 Feb 7, 2025
@loadams loadams changed the title [Test] Update workflows that use cuda 12.1 to use runners with 12.4 Update workflows that use cuda 12.1 to use runners with 12.4 Feb 12, 2025
@loadams loadams changed the title Update workflows that use cuda 12.1 to use runners with 12.4 Update workflows to cuda 12.4 Feb 12, 2025
@loadams loadams merged commit 079de6b into master Feb 12, 2025
12 of 13 checks passed
@loadams loadams deleted the loadams/update-runners-124 branch February 12, 2025 23:25
gyou2021 pushed a commit to gyou2021/DeepSpeed that referenced this pull request Feb 18, 2025
- Update existing workflows that use cu121 to cu124. Note, this means
that where we download torch latest, we will now be getting torch 2.6
rather than the torch latest 2.5 provided with cuda 12.1.
- Note, nv-nightly is failing in master currently due to unrelated
errors, so this could be ignored in this PR (nv-nightly tested locally,
where it passes with 12.1 and it also passes with 12.4).

---------

Signed-off-by: Fabien Dupont <[email protected]>
Signed-off-by: Logan Adams <[email protected]>
Signed-off-by: Olatunji Ruwase <[email protected]>
Signed-off-by: inkcherry <[email protected]>
Signed-off-by: Omar Elayan <[email protected]>
Co-authored-by: Fabien Dupont <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
Co-authored-by: Liangliang Ma <[email protected]>
Co-authored-by: inkcherry <[email protected]>
Co-authored-by: Omar Elayan <[email protected]>
Signed-off-by: gyou2021 <[email protected]>
gyou2021 pushed a commit to gyou2021/DeepSpeed that referenced this pull request Feb 18, 2025
- Update existing workflows that use cu121 to cu124. Note, this means
that where we download torch latest, we will now be getting torch 2.6
rather than the torch latest 2.5 provided with cuda 12.1.
- Note, nv-nightly is failing in master currently due to unrelated
errors, so this could be ignored in this PR (nv-nightly tested locally,
where it passes with 12.1 and it also passes with 12.4).

---------

Signed-off-by: Fabien Dupont <[email protected]>
Signed-off-by: Logan Adams <[email protected]>
Signed-off-by: Olatunji Ruwase <[email protected]>
Signed-off-by: inkcherry <[email protected]>
Signed-off-by: Omar Elayan <[email protected]>
Co-authored-by: Fabien Dupont <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
Co-authored-by: Liangliang Ma <[email protected]>
Co-authored-by: inkcherry <[email protected]>
Co-authored-by: Omar Elayan <[email protected]>
Signed-off-by: gyou2021 <[email protected]>
gyou2021 pushed a commit to gyou2021/DeepSpeed that referenced this pull request Feb 28, 2025
- Update existing workflows that use cu121 to cu124. Note, this means
that where we download torch latest, we will now be getting torch 2.6
rather than the torch latest 2.5 provided with cuda 12.1.
- Note, nv-nightly is failing in master currently due to unrelated
errors, so this could be ignored in this PR (nv-nightly tested locally,
where it passes with 12.1 and it also passes with 12.4).

---------

Signed-off-by: Fabien Dupont <[email protected]>
Signed-off-by: Logan Adams <[email protected]>
Signed-off-by: Olatunji Ruwase <[email protected]>
Signed-off-by: inkcherry <[email protected]>
Signed-off-by: Omar Elayan <[email protected]>
Co-authored-by: Fabien Dupont <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
Co-authored-by: Liangliang Ma <[email protected]>
Co-authored-by: inkcherry <[email protected]>
Co-authored-by: Omar Elayan <[email protected]>
Signed-off-by: gyou2021 <[email protected]>
tohtana pushed a commit that referenced this pull request Feb 28, 2025
- Update existing workflows that use cu121 to cu124. Note, this means
that where we download torch latest, we will now be getting torch 2.6
rather than the torch latest 2.5 provided with cuda 12.1.
- Note, nv-nightly is failing in master currently due to unrelated
errors, so this could be ignored in this PR (nv-nightly tested locally,
where it passes with 12.1 and it also passes with 12.4).

---------

Signed-off-by: Fabien Dupont <[email protected]>
Signed-off-by: Logan Adams <[email protected]>
Signed-off-by: Olatunji Ruwase <[email protected]>
Signed-off-by: inkcherry <[email protected]>
Signed-off-by: Omar Elayan <[email protected]>
Co-authored-by: Fabien Dupont <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
Co-authored-by: Liangliang Ma <[email protected]>
Co-authored-by: inkcherry <[email protected]>
Co-authored-by: Omar Elayan <[email protected]>
Signed-off-by: Masahiro Tanaka <[email protected]>
ys950902 pushed a commit to ys950902/DeepSpeed that referenced this pull request Mar 6, 2025
- Update existing workflows that use cu121 to cu124. Note, this means
that where we download torch latest, we will now be getting torch 2.6
rather than the torch latest 2.5 provided with cuda 12.1.
- Note, nv-nightly is failing in master currently due to unrelated
errors, so this could be ignored in this PR (nv-nightly tested locally,
where it passes with 12.1 and it also passes with 12.4).

---------

Signed-off-by: Fabien Dupont <[email protected]>
Signed-off-by: Logan Adams <[email protected]>
Signed-off-by: Olatunji Ruwase <[email protected]>
Signed-off-by: inkcherry <[email protected]>
Signed-off-by: Omar Elayan <[email protected]>
Co-authored-by: Fabien Dupont <[email protected]>
Co-authored-by: Olatunji Ruwase <[email protected]>
Co-authored-by: Liangliang Ma <[email protected]>
Co-authored-by: inkcherry <[email protected]>
Co-authored-by: Omar Elayan <[email protected]>
Signed-off-by: yisheng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants