-
Notifications
You must be signed in to change notification settings - Fork 61
{2023.06}[NVHPC/25.1-CUDA-12.6] add hook for nvhpc #1043
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 2023.06-software.eessi.io
Are you sure you want to change the base?
{2023.06}[NVHPC/25.1-CUDA-12.6] add hook for nvhpc #1043
Conversation
This adds a pre_configure_hook for NVHPC. It performs some search and replace operations on the "localrc" file used by NVHPC to detect information about the system. In particular it points the sysroot flag at the eessi eprefix variable, and appends two variables definitions about where to look for system libraries. The content of the hook is extracted from: https://github.com/ComputeCanada/easybuild-computecanada-config/blob/main/2023/cc_hooks.py#L544-L547
Instance
|
Instance
|
Instance
|
Instance
|
Instance
|
Instance
|
Hi @adammccartney , As in the PR there is no build generate with its corresponding tests, could you kindly share the steps you did to test this so we can also test and reproduce it? Thank you! |
pre-configure hook for nvhpc | ||
- search and replace operations in the ec dict | ||
""" | ||
if self.name == "NVHPC": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good for now, but there is quite a bit of discussion currently about changing the naming in the EasyBuild context. We are currently naming the compilers only NVHPC
but we should perhaps be defining a toolchain hierarchy for NVHPC
since they also contain MPI and some math libraries. This may lead to the compilers being called something like nvidia_compilers
or NVHPC
becoming a fatter toolchain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're also going to need some versioning clauses here for things you have actually tested.
This is definitely a corner case at present as EESSI itself will not ship his toolchain (currently), and I wonder if we should not allow an environment variable to force the use of the hook, something like
if self.name == "NVHPC": | |
if self.name == "NVHPC": | |
force_nvhpc_hook = 'EESSI_FORCE_NVHPC_HOOK' | |
if self.version in [...] or os.getenv(force_nvhpc_hook, False): | |
... | |
else: | |
print_msg(f"Not using existing hook for {self.name}/{self.version}, if you wish to force this please set the envvar {force_nvhpc_hook}") |
Sure, would be happy to. Would you mind giving a few points of guidance? Let me know what would be useful to see.
As you can see we are referencing an explicit config for easybuild and I think the easyconfig for NVHPC is slightly adapted to include the "accept-eula" variable or whatver. I'll backport this to a "vanilla" eessi-extend environment today that can be used to install stuff on host-injections. I guess it would be useful to have a command that can be run in the standard container started by |
Replaces EBROOTGENTOO with EPREFIX/usr
eb_hooks.py
Outdated
new_opts = f'''installdir=%(installdir)s/Linux_x86_64/%(version)s | ||
EPREFIX={eprefix} | ||
sed -i "s@\(set LDSO=.*\);@\\1 --sysroot=$EPREFIX;@" $installdir/compilers/bin/localrc | ||
echo "set DEFLIBDIR=$EPREFIX/usr/lib64;" >> $installdir/compilers/bin/localrc | ||
echo "set DEFSTDOBJDIR=$EPREFIX/usr/lib64;" >> $installdir/compilers/bin/localrc''' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this a bit more, the logic here could be added directly to the relevant section of the NVHPC easyblock (and used conditionally based on whether the EB build option --sysroot
is set).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, interestingly the sanity check now fails if I try to build this directly on a compute node ( |
bot: build inst:eessi-bot-mc-azure arch:x86_64/amd/zen4 repo:eessi.io-2023.06-software accelerator:nvidia/cc90 |
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
Updates by the bot instance
|
So I finally got this working (building to a custom cvmfs repo, then loading on a compute node and reproducing the sanity checks). There were a number of issues that needed to be worked out. 1. A potential issue that might appear if the linker happens to first find a script called "libc.so". The script is located in the compat layer and looks like it may possibly(?) redirect the linker to the host /lib64/libc.so.6 if it gets picked up. > cat $EPREFIX/usr/lib64/libc.so /* GNU ld script Use the shared library, but some functions are only in the static library, so try that secondarily. */ OUTPUT_FORMAT(elf64-x86-64) GROUP ( /lib64/libc.so.6 /usr/lib64/libc_nonshared.a AS_NEEDED ( /lib64/ld-linux-x86-64.so.2 ) ) 2. Another issue is that nvc++ will scan a number of directories looking for localrc files, if there are any old localrc files lying around that point to the wrong place, this will cause problems. The case below shows a situation where the localrc was pointing to a (removed) host_injections path >nvc++ -dryrun -std=c++20 minimal.cpp minimal Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/.nvc++rc Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/nativerc Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/fnativerc Skipping rcfiles/internalrc (not found) Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/ccrc Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/ccirc Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/cpprc Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/cppcurc Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/paralgorc Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/x86rc Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/x8664rc Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/lin86rc Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/lincommonrc Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/lin8664rc Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/llvmcomprc Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/llvmrc Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/llvmx86rc Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/llvmx8664rc Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/omprc Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/iparc Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/acc1rc Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/cudaselectrc Skipping rcfiles/persnvflangrc (not found) Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/acclin8664rc Skipping rcfiles/acctoolsrc (not found) Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/targetrc Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/deprecatedrc Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/c++llvmrc Skipping rcfiles/llvmxrc (not found) Skipping rcfiles/tunexrc (not found) Skipping rcfiles/clangxrc (not found) Skipping rcfiles/gccxrc (not found) Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/rcfiles/persnvirc Reading rcfile /cvmfs/software.asc.ac.at/versions/2023.06/software/linux/x86_64/amd/zen4/software/NVHPC/25.1-CUDA-12.6.0/Linux_x86_64/25.1/compilers/bin/localrc Skipping localrc.n3001-003 (not found) Reading rcfile /home/fs60000/admccartney/.config/NVIDIA/nvhpc/25.1/localrc.n3001-003 Skipping siterc (not found) Skipping siterc.n3001-003 (not found) Skipping $GCCLOCALRC (not found) Skipping .mynvrc (not found) Skipping .mynvc++rc (not found) Skipping .mynvcpprc (not found) Skipping .mynvx86rc (not found) Skipping $MYLOCALRC (not found) Skipping cudarc (not found) Action(realpath(/opt/acceptance-tests/eessi/2023.06/software/linux/x86_64/amd/zen4/software/GCCcore/13.3.0/bin/../lib/gcc/x86_64-pc-linux-gnu/13.3.0//../../../..)) Error in path /opt/acceptance-tests/eessi/2023.06/software/linux/x86_64/amd/zen4/software/GCCcore/13.3.0/bin/../lib/gcc/x86_64-pc-linux-gnu/13.3.0//../../../.. It should __not__ be finding the localrc files in my home directory {EESSI 2023.06} admccartney@n3001-003 ~/tests/nvhpc >rm -rf ~/.config/NVIDIA/nvhpc/25.1/ > nvc++ -std=c++20 minimal.cpp -o minimal >./minimal Hello world
Okay, so I got this working with some careful attention to what the linker was up to. |
This adds a pre_configure_hook for NVHPC. It performs some search and replace operations on the "localrc" file used by NVHPC to detect information about the system. In particular it points the sysroot flag at the eessi eprefix variable, and appends two variables definitions about where to look for system libraries.
The content of the hook is extracted from:
https://github.com/ComputeCanada/easybuild-computecanada-config/blob/main/2023/cc_hooks.py#L544-L547