diff --git a/docs/site_specific_config/gpu.md b/docs/site_specific_config/gpu.md index 523e987341..1c25b25355 100644 --- a/docs/site_specific_config/gpu.md +++ b/docs/site_specific_config/gpu.md @@ -29,27 +29,38 @@ For CUDA-enabled software to run, it needs to be able to find the **NVIDIA GPU d The challenge here is that the NVIDIA GPU drivers are not _always_ in a standard system location, and that we can not install the GPU drivers in EESSI (since they are too closely tied to the client OS and GPU hardware). -### Compiling CUDA software {: #cuda_sdk } +### Compiling software on top of CUDA, cuDNN and other SDKs provided by NVIDIA {: #cuda_sdk } -An additional requirement is necessary if you want to be able to compile CUDA-enabled software using a CUDA installation included in EESSI. This requires a *full* CUDA SDK, but the [CUDA SDK End User License Agreement (EULA)](https://docs.nvidia.com/cuda/eula/index.html) does not allow for full redistribution. In EESSI, we are (currently) only allowed to redistribute the files needed to *run* CUDA software. +An additional requirement is necessary if you want to be able to compile software +that makes use of a CUDA installation or cu\* SDKs (e.g., cuDNN) included in +EESSI. This requires a *full* installation of the CUDA SDK, cuDNN, etc. However, +the [CUDA SDK End User License Agreement (EULA)](https://docs.nvidia.com/cuda/eula/index.html) +and the [Software License Agreement (SLA) for NVIDIA cuDNN](https://docs.nvidia.com/deeplearning/cudnn/latest/reference/eula.html) +do not allow for full redistribution. In EESSI, we are (currently) only allowed to +redistribute the files needed to *run* CUDA and cuDNN software. -!!! note "Full CUDA SDK only needed to *compile* CUDA software" - Without a full CUDA SDK on the host system, you will still be able to *run* CUDA-enabled software from the EESSI stack, - you just won't be able to *compile* additional CUDA software. +!!! note "A full CUDA SDK or cuDNN SDK is only needed to *compile* CUDA or cuDNN software" + Without a full CUDA SDK or cuDNN SDK on the host system, you will still + be able to *run* CUDA-enabled or cuDNN-enabled software from the EESSI stack, + you just won't be able to *compile* additional CUDA or cuDNN software. -Below, we describe how to make sure that the EESSI software stack can find your NVIDIA GPU drivers and (optionally) full installations of the CUDA SDK. +Below, we describe how to make sure that the EESSI software stack can find your +NVIDIA GPU drivers and (optionally) full installations of the CUDA SDK and the +cuDNN SDK. ### Configuring CUDA driver location {: #driver_location } All CUDA-enabled software in EESSI expects the CUDA drivers to be available in a specific subdirectory of this `host_injections` directory. -In addition, installations of the CUDA SDK included EESSI are stripped down to the files that we are allowed to redistribute; +In addition, installations of the CUDA SDK and cuDNN SDK included EESSI are stripped down to the files that we are allowed to redistribute; all other files are replaced by symbolic links that point to another specific subdirectory of `host_injections`. For example: ``` $ ls -l /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen3/software/CUDA/12.1.1/bin/nvcc lrwxrwxrwx 1 cvmfs cvmfs 109 Dec 21 14:49 /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen3/software/CUDA/12.1.1/bin/nvcc -> /cvmfs/software.eessi.io/host_injections/2023.06/software/linux/x86_64/amd/zen3/software/CUDA/12.1.1/bin/nvcc ``` -If the corresponding full installation of the CUDA SDK is available there, the CUDA installation included in EESSI can be used to build CUDA software. +If the corresponding full installation of the CUDA SDK is available there, the +CUDA installation included in EESSI can be used to build CUDA software. The same +applies to the cuDNN SDK. ### Using NVIDIA GPUs via a native EESSI installation {: #nvidia_eessi_native } @@ -74,37 +85,39 @@ This script uses `ldconfig` on your host system to locate your GPU drivers, and Note that it is safe to re-run the script even if no driver updates were done: the script should detect that the current version of the drivers were already symlinked. -#### Installing full CUDA SDK (optional) +#### Installing full CUDA SDK and cuDNN SDK (optional) -To install a full CUDA SDK under `host_injections`, use the `install_cuda_host_injections.sh` script that is included in EESSI: +To install a full CUDA SDK and cuDNN SDK under `host_injections`, use the `install_cuda_and_libraries.sh` script that is included in EESSI: ```{ .bash .copy } -/cvmfs/software.eessi.io/versions/${EESSI_VERSION}/scripts/gpu_support/nvidia/install_cuda_host_injections.sh +/cvmfs/software.eessi.io/versions/${EESSI_VERSION}/scripts/gpu_support/nvidia/install_cuda_and_libraries.sh ``` -For example, to install CUDA 12.1.1 in the directory that the [`host_injections` variant symlink](host_injections.md) points to, +For example, to install CUDA 12.1.1 and cuDNN 8.9.2.26 in the directory that the [`host_injections` variant symlink](host_injections.md) points to, using `/tmp/$USER/EESSI` as directory to store temporary files: ``` -/cvmfs/software.eessi.io/versions/${EESSI_VERSION}/scripts/gpu_support/nvidia/install_cuda_host_injections.sh --cuda-version 12.1.1 --temp-dir /tmp/$USER/EESSI --accept-cuda-eula +/cvmfs/software.eessi.io/versions/${EESSI_VERSION}/scripts/gpu_support/nvidia/install_cuda_and_libraries.sh --temp-dir /tmp/$USER/EESSI --accept-cuda-eula --accept-cudnn-eula ``` -You should choose the CUDA version you wish to install according to what CUDA versions are included in EESSI; -see the output of `module avail CUDA/` after [setting up your environment for using -EESSI](../using_eessi/setting_up_environment.md). +The versions 12.1.1 for CUDA and 8.9.2.26 for cuDNN are defined in an easystack +file that is also included in EESSI: +``` +/cvmfs/software.eessi.io/versions/${EESSI_VERSION}/scripts/gpu_support/nvidia/easystacks/eessi-2023.06-eb-4.9.4-2023a-CUDA-host-injections.yml +``` +By default, the install script processes all files matching `eessi-*CUDA*.yml` in +the above `/cvmfs/software.eessi.io/versions/${EESSI_VERSION}/scripts/gpu_support/nvidia/easystacks` directory. -You can run `/cvmfs/software.eessi.io/versions/${EESSI_VERSION}/scripts/gpu_support/nvidia/install_cuda_host_injections.sh --help` to check all of the options. +You can run `/cvmfs/software.eessi.io/versions/${EESSI_VERSION}/scripts/gpu_support/nvidia/install_cuda_and_libraries.sh --help` to check all of the options. !!! tip - This script uses EasyBuild to install the CUDA SDK. For this to work, two requirements need to be satisfied: - - * `module load EasyBuild` should work (or the `eb` command is already available in the environment); - * The version of EasyBuild being used should provide the requested version of the CUDA easyconfig file - (in the example case above, that's `CUDA-12.1.1.eb`). + This script uses EasyBuild to install the CUDA SDK and the cuDNN SDK. For this to work, two requirements need to be satisfied: - You can rely on the EasyBuild installation that is included in EESSI for this. + * `module load EasyBuild/${EB_VERSION}` must work (EB_VERSION is extracted + from the name of the easystack file (e.g., from `eb-4.9.4` EB_VERSION is + derived as 4.9.4); + * `module load EESSI-extend/${EESSI_VERSION}-easybuild` must work. - Alternatively, you may load an EasyBuild module manually _before_ running the `install_cuda_host_injections.sh` - script to make an `eb` command available. + Both modules are included in EESSI. ### Using NVIDIA GPUs via EESSI in a container {: #nvidia_eessi_container }