cuda: Verify Cuda Toolkit is Supported by the NVIDIA Architecture#567
Open
Treece-Burgess wants to merge 2 commits intoicl-utk-edu:masterfrom
Open
cuda: Verify Cuda Toolkit is Supported by the NVIDIA Architecture#567Treece-Burgess wants to merge 2 commits intoicl-utk-edu:masterfrom
Treece-Burgess wants to merge 2 commits intoicl-utk-edu:masterfrom
Conversation
…vices on the machine
… and dynamically allocate the copy for LD_LIBRARY_PATH
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request Description
This PR adds the functionality to check a users Cuda Toolkit to verify that it can be used with the NVIDIA architecture on the machine and provide a helpful disabled message. This is important as:
Testing
Setup
Testing was done on Illyad and Voltar at Oregon.
Illyad:
OS: RHEL 8.10
CPU: AMD EPYC 7402
GPU: 1 * H100
Cuda Toolkit: 11.5.2 and 12.9.0
Voltar:
OS: RHEL 8.10
CPU: Intel Xeon Gold 6226R
GPU: 1 * A100, 1 * V100, and 1 * P100
Cuda Toolkit: 13.0.0
Results
Illyad:
Voltar:
The V100 and P100 cause the cuda component to be disabled with Cuda Toolkit 13.0, but setting
CUDA_VISIBLE_DEVICES=1,2will allow for it to be active as this would only "show" the A100.* -
papi_component_avail,papi_native_avail, andpapi_command_lineAuthor Checklist
Why this PR exists. Reference all relevant information, including background, issues, test failures, etc
Commits are self contained and only do one thing
Commits have a header of the form:
module: short descriptionCommits have a body (whenever relevant) containing a detailed description of the addressed problem and its solution
The PR needs to pass all the tests