GH-48582: [CI][GPU][C++][Python] Add new CUDA jobs using the new self-hosted runners#48583
GH-48582: [CI][GPU][C++][Python] Add new CUDA jobs using the new self-hosted runners#48583raulcd merged 19 commits intoapache:mainfrom
Conversation
|
|
|
@pitrou @kou this PR was originally created to test the
From an organization standpoint would you prefer to have the C++ jobs added to the |
|
The Python CUDA 13.0.2 errors are not related to this PR per se (this is only adding the new runners) but there seems to be an issue initializing CUDA: |
Perhaps the driver version on the machine is older than 13.0.2 used in the container. Do you have a way to check the driver version installed on the machine that's hosting the docker image? (e.g. what is its |
|
From the I think that the machine will need at least driver version 580 for a CUDA 13 container. Are you able to change the underlying machine? |
|
Alternatively, it may be possible to make this configuration work by adding the relevant |
|
We are using the default images built here (yes they point out they have cuda 12): |
|
Thanks @gmarkall for your help, unfortunately it seems to fail with a different error if and |
|
I prefer |
…w self-hosted runners
This reverts commit f5766b7.
…t reusing the CPP workflow
|
I've simplified the job adding cuda and ubuntu to the matrix. Thanks @kou I'll merge this into 23.0.0 in case we do a patch release in the future to have cuda validation. |
…-hosted runners (#48583) ### Rationale for this change The CUDA jobs stopped working when Voltron Data infrastructure went down. We have set up with ASF Infra a [runs-on](https://runs-on.com/runners/gpu/) solution to run CUDA runners. ### What changes are included in this PR? Add the new workflow for `cuda_extra.yml` with CI jobs that use the runs-on CUDA runners. Due to the underlying instances having CUDA 12.9 the jobs to be run are: - AMD64 Ubuntu 22 CUDA 11.7.1 - AMD64 Ubuntu 24 CUDA 12.9.0 - AMD64 Ubuntu 22 CUDA 11.7.1 Python - AMD64 Ubuntu 24 CUDA 12.9.0 Python A follow up issue has been created to add jobs for CUDA 13, see: #48783 A new label `CI: Extra: CUDA` has also been created. ### Are these changes tested? Yes via CI ### Are there any user-facing changes? No * GitHub Issue: #48582 Authored-by: Raúl Cumplido <raulcumplido@gmail.com> Signed-off-by: Raúl Cumplido <raulcumplido@gmail.com>
|
After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit 985b16e. There weren't enough matching historic benchmark results to make a call on whether there were regressions. The full Conbench report has more details. |
Rationale for this change
The CUDA jobs stopped working when Voltron Data infrastructure went down. We have set up with ASF Infra a runs-on solution to run CUDA runners.
What changes are included in this PR?
Add the new workflow for
cuda_extra.ymlwith CI jobs that use the runs-on CUDA runners.Due to the underlying instances having CUDA 12.9 the jobs to be run are:
A follow up issue has been created to add jobs for CUDA 13, see: #48783
A new label
CI: Extra: CUDAhas also been created.Are these changes tested?
Yes via CI
Are there any user-facing changes?
No