Skip to content

{2023.06}[NVHPC/25.1-CUDA-12.6] add hook for nvhpc #1043

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: 2023.06-software.eessi.io
Choose a base branch
from

Conversation

adammccartney
Copy link

This adds a pre_configure_hook for NVHPC. It performs some search and replace operations on the "localrc" file used by NVHPC to detect information about the system. In particular it points the sysroot flag at the eessi eprefix variable, and appends two variables definitions about where to look for system libraries.

The content of the hook is extracted from:
https://github.com/ComputeCanada/easybuild-computecanada-config/blob/main/2023/cc_hooks.py#L544-L547

This adds a pre_configure_hook for NVHPC. It performs some search and replace
operations on the "localrc" file used by NVHPC to detect information about the
system. In particular it points the sysroot flag at the eessi eprefix variable,
and appends two variables definitions about where to look for system libraries.

The content of the hook is extracted from:
https://github.com/ComputeCanada/easybuild-computecanada-config/blob/main/2023/cc_hooks.py#L544-L547
@eessi-bot-deucalion
Copy link

Instance eessi-bot-deucalion is configured to build for:

  • architectures: aarch64/a64fx
  • repositories: eessi.io-2023.06-software

Copy link

eessi-bot bot commented Apr 24, 2025

Instance eessi-bot-mc-azure is configured to build for:

  • architectures: x86_64/amd/zen4
  • repositories: eessi.io-2023.06-compat, eessi.io-2023.06-software

@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Apr 24, 2025

Instance eessi-bot-vsc-ugent is configured to build for:

  • architectures: x86_64/amd/zen3
  • repositories: eessi-hpc.org-2023.06-software, eessi.io-2023.06-compat, eessi-hpc.org-2023.06-compat, eessi.io-2023.06-software

@eessi-bot-toprichard
Copy link

Instance rt-Grace-jr is configured to build for:

  • architectures: aarch64/nvidia/grace
  • repositories: eessi.io-2023.06-software

@eessi-bot-surf
Copy link

Instance eessi-bot-surf is configured to build for:

  • architectures: x86_64/amd/zen4, x86_64/amd/zen2
  • repositories: eessi-hpc.org-2023.06-software, eessi.io-2023.06-software, eessi.io-2023.06-compat, eessi-hpc.org-2023.06-compat

Copy link

eessi-bot bot commented Apr 24, 2025

Instance eessi-bot-mc-aws is configured to build for:

  • architectures: x86_64/generic, x86_64/intel/haswell, x86_64/intel/sapphirerapids, x86_64/intel/skylake_avx512, x86_64/intel/cascadelake, x86_64/intel/icelake, x86_64/amd/zen2, x86_64/amd/zen3, aarch64/generic, aarch64/neoverse_n1, aarch64/neoverse_v1
  • repositories: eessi.io-2023.06-compat, eessi.io-2023.06-software

@hvelab
Copy link
Contributor

hvelab commented Apr 29, 2025

Hi @adammccartney ,

As in the PR there is no build generate with its corresponding tests, could you kindly share the steps you did to test this so we can also test and reproduce it?

Thank you!

pre-configure hook for nvhpc
- search and replace operations in the ec dict
"""
if self.name == "NVHPC":
Copy link
Member

@ocaisa ocaisa Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good for now, but there is quite a bit of discussion currently about changing the naming in the EasyBuild context. We are currently naming the compilers only NVHPC but we should perhaps be defining a toolchain hierarchy for NVHPC since they also contain MPI and some math libraries. This may lead to the compilers being called something like nvidia_compilers or NVHPC becoming a fatter toolchain.

Copy link
Member

@ocaisa ocaisa Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're also going to need some versioning clauses here for things you have actually tested.

This is definitely a corner case at present as EESSI itself will not ship his toolchain (currently), and I wonder if we should not allow an environment variable to force the use of the hook, something like

Suggested change
if self.name == "NVHPC":
if self.name == "NVHPC":
force_nvhpc_hook = 'EESSI_FORCE_NVHPC_HOOK'
if self.version in [...] or os.getenv(force_nvhpc_hook, False):
...
else:
print_msg(f"Not using existing hook for {self.name}/{self.version}, if you wish to force this please set the envvar {force_nvhpc_hook}")

@adammccartney
Copy link
Author

Hi @adammccartney ,

As in the PR there is no build generate with its corresponding tests, could you kindly share the steps you did to test this so we can also test and reproduce it?

Thank you!

Sure, would be happy to. Would you mind giving a few points of guidance? Let me know what would be useful to see.
I haven't had the time to look into ReFrame at all yet, so there are no tests (yet) apart from the sanity checks in the EasyBuild. The build last week was done in the eessi-container from the "software layer" repo, which was slightly adapted to suit our own build environment. The build command looks like the following:

#!/bin/bash

project_root="$(realpath $(dirname $(dirname $(dirname $(dirname $BASH_SOURCE)))))"

eb "${project_root}/easyconfigs/2025/NVHPC-25.1-CUDA-12.6.0.eb" \
    -r --cuda-compute-capabilities=9.0 \
    --configfiles="${project_root}/easybuild-asc-config/2025/config.cfg" \
    --hooks="${project_root}/easybuild-asc-config/2025/eb_hooks.py"

As you can see we are referencing an explicit config for easybuild and I think the easyconfig for NVHPC is slightly adapted to include the "accept-eula" variable or whatver. I'll backport this to a "vanilla" eessi-extend environment today that can be used to install stuff on host-injections. I guess it would be useful to have a command that can be run in the standard container started by eessi_container.sh ?

Replaces EBROOTGENTOO with EPREFIX/usr
eb_hooks.py Outdated
Comment on lines 754 to 758
new_opts = f'''installdir=%(installdir)s/Linux_x86_64/%(version)s
EPREFIX={eprefix}
sed -i "s@\(set LDSO=.*\);@\\1 --sysroot=$EPREFIX;@" $installdir/compilers/bin/localrc
echo "set DEFLIBDIR=$EPREFIX/usr/lib64;" >> $installdir/compilers/bin/localrc
echo "set DEFSTDOBJDIR=$EPREFIX/usr/lib64;" >> $installdir/compilers/bin/localrc'''
Copy link
Member

@ocaisa ocaisa Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this a bit more, the logic here could be added directly to the relevant section of the NVHPC easyblock (and used conditionally based on whether the EB build option --sysroot is set).

Copy link
Member

@ocaisa ocaisa Apr 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adammccartney
Copy link
Author

adammccartney commented Apr 29, 2025

So, interestingly the sanity check now fails if I try to build this directly on a compute node (x86_64/amd/zen4 is the architecture by the way) . The initial build was done in the eessi container as I mentioned, set up to use the EESSI_PROJECT_INSTALL variable pointing at a writeable /cvmfs/software.asc.ac.at directory.
The build now fails when I try to use EESSI_SITE_INSTALL. Maybe there is something leaking in via the ld cache on the host as was previously observed. Makes me wonder about how usable the compiler is if we load it from the custom cvmfs repo...

@adammccartney
Copy link
Author

bot: build inst:eessi-bot-mc-azure arch:x86_64/amd/zen4 repo:eessi.io-2023.06-software accelerator:nvidia/cc90

Copy link

eessi-bot bot commented Apr 29, 2025

Updates by the bot instance eessi-bot-mc-aws (click for details)
  • account adammccartney has NO permission to send commands to the bot

Copy link

eessi-bot bot commented Apr 29, 2025

Updates by the bot instance eessi-bot-mc-azure (click for details)
  • account adammccartney has NO permission to send commands to the bot

@eessi-bot-surf
Copy link

Updates by the bot instance eessi-bot-surf (click for details)
  • account adammccartney has NO permission to send commands to the bot

@gpu-bot-ugent
Copy link

gpu-bot-ugent bot commented Apr 29, 2025

Updates by the bot instance eessi-bot-vsc-ugent (click for details)
  • account adammccartney has NO permission to send commands to the bot

So I finally got this working (building to a custom cvmfs repo, then loading on a compute node and reproducing
the sanity checks).

There were a number of issues that needed to be worked out.

1. A potential issue that might appear if the linker happens to first find a script called "libc.so".
The script is located in the compat layer and looks like it may possibly(?) redirect the linker to
the host /lib64/libc.so.6 if it gets picked up.

> cat $EPREFIX/usr/lib64/libc.so
/* GNU ld script
   Use the shared library, but some functions are only in
   the static library, so try that secondarily.  */
OUTPUT_FORMAT(elf64-x86-64)
GROUP ( /lib64/libc.so.6 /usr/lib64/libc_nonshared.a  AS_NEEDED ( /lib64/ld-linux-x86-64.so.2 ) )

2. Another issue is that nvc++ will scan a number of directories looking for localrc files, if there
are any old localrc files lying around that point to the wrong place, this will cause problems.
The case below shows a situation where the localrc was pointing to a (removed) host_injections path

>nvc++ -dryrun -std=c++20 minimal.cpp minimal
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/.nvc++rc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/nativerc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/fnativerc
Skipping rcfiles/internalrc (not found)
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/ccrc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/ccirc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/cpprc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/cppcurc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/paralgorc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/x86rc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/x8664rc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/lin86rc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/lincommonrc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/lin8664rc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/llvmcomprc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/llvmrc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/llvmx86rc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/llvmx8664rc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/omprc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/iparc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/acc1rc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/cudaselectrc
Skipping rcfiles/persnvflangrc (not found)
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/acclin8664rc
Skipping rcfiles/acctoolsrc (not found)
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/targetrc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/deprecatedrc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/c++llvmrc
Skipping rcfiles/llvmxrc (not found)
Skipping rcfiles/tunexrc (not found)
Skipping rcfiles/clangxrc (not found)
Skipping rcfiles/gccxrc (not found)
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/persnvirc
Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/localrc
Skipping localrc.n3001-003 (not found)
Reading rcfile /home/fs60000/admccartney/.config/NVIDIA/nvhpc/25.1/localrc.n3001-003
Skipping siterc (not found)
Skipping siterc.n3001-003 (not found)
Skipping $GCCLOCALRC (not found)
Skipping .mynvrc (not found)
Skipping .mynvc++rc (not found)
Skipping .mynvcpprc (not found)
Skipping .mynvx86rc (not found)
Skipping $MYLOCALRC (not found)
Skipping cudarc (not found)
Action(realpath(/opt/acceptance-tests/eessi/2023.06/software/linux/x86_64/amd/zen4/software/GCCcore/13.3.0/bin/../lib/gcc/x86_64-pc-linux-gnu/13.3.0//../../../..))
Error in path /opt/acceptance-tests/eessi/2023.06/software/linux/x86_64/amd/zen4/software/GCCcore/13.3.0/bin/../lib/gcc/x86_64-pc-linux-gnu/13.3.0//../../../..

It should __not__ be finding the localrc files in my home directory

{EESSI 2023.06} admccartney@n3001-003 ~/tests/nvhpc
>rm -rf ~/.config/NVIDIA/nvhpc/25.1/

> nvc++ -std=c++20 minimal.cpp -o minimal
>./minimal
Hello world
@adammccartney
Copy link
Author

Okay, so I got this working with some careful attention to what the linker was up to.
See the commit message a2fe8be
For a bit more info.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants