-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixing mpi.lua to get and set env. variables correctly after unloading or reloading the module #1465
base: develop
Are you sure you want to change the base?
Fixing mpi.lua to get and set env. variables correctly after unloading or reloading the module #1465
Conversation
setenv("I_MPI_FC", os.getenv("FC")) | ||
local i_mpi_cc = os.getenv("CC") | ||
local i_mpi_cxx = os.getenv("CXX") | ||
local i_mpi_f77 = os.getenv("F77") or "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why only for f77 and not for all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if the F77 is defined in the first place; it may not be needed and possibly is left for the legacy reasons.
i_mpi_f77 is therefore not used in the conditional statement further below in line 41.
local i_mpi_f90 = os.getenv("FC") | ||
local i_mpi_fc = os.getenv("FC") | ||
|
||
if i_mpi_cc and i_mpi_cxx and i_mpi_f90 and i_mpi_fc then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an instance where these aren't all getting set? Isn't all or nothing based on compilers.lua? I might not under the problem exactly, so if you could add steps to reproduce in the issue description that would be appreciated.
I can confirm Natalie's finds on Orion and Hera when using GNU stack for spack-stack > 1.7.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure about this PR. If you look at https://github.com/JCSDA/spack-stack/blob/develop/spack-ext/lib/jcsda-emc/spack-stack/stack/templates/compiler.lua, you see that all four environmental variables are defined: CC, CXX, FC, F77. Since the compiler meta-module is always loaded before the MPI meta-module, and therefore alI four variables are set, I am wondering what the effect of the changes proposed here is.
I am also wondering why this is only a problem with OpenMPI? |
The loading of the modules is done inside the another meta-module, i.e. ./stack-openmpi/4.1.6.lua contains the following:
Unlike in a bash shell, where the execution is done consecutively, and so the env. variables would be available immediately after the "load module/A" is executed, Lmod modulefiles work differently. They are evaluated first (the entire modulefile *.lua), and executed on the second pass. During the first pass, the env. variables from gnu/9.2.0 module are not yet available, and when the following lines are evaluated, they produce the error, because setenv syntax cannot have a zero string.
|
Ok, slowly I am beginning to understand. But why is this a problem only when you load or unload the module twice? |
This is not yet clear... It is likely this issue is not only with gnu-based openmpi modulefiles, but any meta-modules that use the same template ./mpi.lua |
Summary
Environmental variables I_MPI_CC, I_MPI_CXX, I_MPI_F77, I_MPI_F90, I_MPI_FC are set correctly and do not produce error when module is unloaded or reloaded.
Error reporting when reloading or unloading the module:
Steps to reproduce the error:
This error is seen in spack-stack-1.8.0 and later when openmpi is used. The spack-stack-1.6.0 that is currently used by UFS-WM and UFS-SRW does not have this issue, as module templates had different syntax at that time.
After loading stack-openmpi for the second time, the following error appears that is repeated 5 times:
Similar error appears when the module is unloaded using
module unload stack-openmpi
. The module is still unloaded, however.or
module unload stack-openmpi/4.1.4
in the last line. The error message:Testing
Tested a modified modulefile as part of spack-stack-1.8.0 installation on MacOS, errors are not produced when unloading or reloading, during the development work. No errors reported when the module is loaded single time during the actual production runs or workflow.
Applications affected
All UFS applications that use openmpi and may do development work involving multiple loading/unloading of the openmpi file
Systems affected
Any system that uses openmpi. Example to reproduce errors on Hera:
After loading the stack-openmpi for the second time, the following error appears:
(this message is repeated 5 times)
Attempt to unload the module
produces the same error messages. The modules however are successfully unloaded.
Dependencies
Checklist
Contributors
@DavidHuber-NOAA - suggesting a modulefile fix for a similar issue in PR ufs-community/ufs-weather-model#2551