Skip to content

Commit

Permalink
Merge branch 'saga-new-gpus' into 'main'
Browse files Browse the repository at this point in the history
Included information about module switch for saga AMD nodes

See merge request documentation/public!834
  • Loading branch information
bast committed Dec 6, 2023
2 parents c5a1613 + 23e82af commit 225743a
Show file tree
Hide file tree
Showing 3 changed files with 58 additions and 2 deletions.
34 changes: 34 additions & 0 deletions code_development/building_gpu.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Building GPU software

The login nodes on Betzy and Saga currently do not allow to compile software for the GPUs
as the cuda driver is not installed.
In order to compile the GPU software one needs an interactive session on a GPU node.
If the GPU is not needed, one can ask for a CPU-only allocation, e.g.:

```
salloc --nodes=1 --time=00:30:00 --partition=<accel|a100> --mem-per-cpu=8G --account=<...>
```

or, if GPUs are required, e.g., for testing purposes:

```
salloc --nodes=1 --time=00:30:00 --partition=<accel|a100> --mem-per-cpu=8G --account=<...> --gpus=1
```

## Saga

There are two types of GPU nodes on Saga, located in two distinct SLURM partitions:

* Intel CPU with 4X Tesla P100, 16GB, `--partition=accel`
* AMD CPUs with 4XA100, 80GB, `--partition=a100`

These are different architectures. By default, Saga loads the Intel software environment,
If you want to run/compile software for the nodes with the AMD CPUs and A100 GPUS
you need to get an allocation on the `a100` partition. Then, inside the allocation,
or inside your job script, switch the module environment

```
module --force swap StdEnv Zen2Env
```
Note that installed modules can vary between the two node types.
3 changes: 2 additions & 1 deletion code_development/overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ Code development
:maxdepth: 1

building.md
building_gpu.md
betzy.md
compilers.md
debugging.md
Expand All @@ -32,4 +33,4 @@ In this section we present a list of tutorials covering different topics in hete
guides_ml.md
guides_containers_gpu.md
guides_monitor_gpu.md
guides_python.md
guides_python.md
23 changes: 22 additions & 1 deletion software/modulescheme.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ The main command for using this system is the module command. You can find a lis

module --help

We use the lmod module system; for more info see <https://lmod.readthedocs.io/en/latest/> in NRIS currently. Below we listed the most commonly used options, but also feel free to ivestigate options in this toolset more thoroughly on developers site.
We use the lmod module system; for more info see <https://lmod.readthedocs.io/en/latest/> in NRIS currently. Below we listed the most commonly used options, but also feel free to investigate options in this toolset more thoroughly on developers site.


## Which modules are currently loaded?
Expand Down Expand Up @@ -113,5 +113,26 @@ This feature is particularly convenient if you spend a lot of time compiling/deb
in interactive sessions. For production calculations using job scripts it is still
recommended to load each module explicitly for clarity.


## GPU modules

### Saga
There are two types of GPU nodes on Saga, located in two distinct SLURM partitions:

* Intel CPU with 4X Tesla P100, 16GB, `--partition=accel`
* AMD CPUs with 4XA100, 80GB, `--partition=a100`

These are different architectures. By default, Saga loads the Intel software environment,
If you want to run/compile software for the nodes with the AMD CPUs and A100 GPUS
you need to get an allocation on the `a100` partition. Then, inside the allocation,
or inside your job script, switch the module environment

```
module --force swap StdEnv Zen2Env
```
Note that installed modules can vary between the two node types.


## Tutorial on module system for software
[Introduction to HPC - Accessing software](https://training.pages.sigma2.no/tutorials/hpc-intro/episodes/14-modules.html)

0 comments on commit 225743a

Please sign in to comment.