Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing mpi.lua to get and set env. variables correctly after unloading or reloading the module #1465

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

natalie-perlin
Copy link
Collaborator

@natalie-perlin natalie-perlin commented Jan 21, 2025

Summary

Environmental variables I_MPI_CC, I_MPI_CXX, I_MPI_F77, I_MPI_F90, I_MPI_FC are set correctly and do not produce error when module is unloaded or reloaded.

Error reporting when reloading or unloading the module:

Lmod Warning:  Syntax error in file:
/Users/Natalie/spack-stack/spack-stack-1.8.0/envs/ufs-srw-env/install/modulefiles/apple-clang/15.0.0/stack-openmpi/5.0.3.lua
 with command: setenv, one or more arguments are not strings.

While processing the following module(s):
    Module fullname           Module Filename
    ---------------           ---------------
    stack-openmpi/5.0.3       /Users/Natalie/spack-stack/spack-stack-1.8.0/envs/ufs-srw-env/install/modulefiles/apple-clang/15.0.0/stack-openmpi/5.0.3.lua
    ufs_macosx.gnu  /Users/Natalie/UFS-WM/ufs-weather-model/modulefiles/ufs_macosx.gnu.lua

Steps to reproduce the error:

This error is seen in spack-stack-1.8.0 and later when openmpi is used. The spack-stack-1.6.0 that is currently used by UFS-WM and UFS-SRW does not have this issue, as module templates had different syntax at that time.

  1. Reproduce the error on Hera with Gnu compiler:
module use /contrib/spack-stack/spack-stack-1.8.0/envs/mapl-2.40.3-gcc-9.2.0/install/modulefiles/Core
module load stack-gcc
module load stack-openmpi
module load stack-openmpi

After loading stack-openmpi for the second time, the following error appears that is repeated 5 times:

Lmod Warning:  Syntax error in file:
/contrib/spack-stack/spack-stack-1.8.0/envs/mapl-2.40.3-gcc-9.2.0/install/modulefiles/gcc/9.2.0/stack-openmpi/4.1.6.lua
 with command: setenv, one or more arguments are not strings.

While processing the following module(s):
    Module fullname      Module Filename
    ---------------      ---------------
    stack-openmpi/4.1.6  /contrib/spack-stack/spack-stack-1.8.0/envs/mapl-2.40.3-gcc-9.2.0/install/modulefiles/gcc/9.2.0/stack-openmpi/4.1.6.lua

Similar error appears when the module is unloaded using module unload stack-openmpi. The module is still unloaded, however.

  1. Reproduce this error on Orion with Gnu compiler:
module use /apps/contrib/spack-stack/spack-stack-1.8.0/envs/ue-gcc-12.2.0/install/modulefiles/Core
module load stack-gcc/12.2.0
module load stack-openmpi/4.1.4
module load stack-openmpi/4.1.4

or module unload stack-openmpi/4.1.4 in the last line. The error message:

Lmod Warning:  Syntax error in file:
/apps/contrib/spack-stack/spack-stack-1.8.0/envs/ue-gcc-12.2.0/install/modulefiles/gcc/12.2.0/stack-openmpi/4.1.4.lua
 with command: setenv, one or more arguments are not strings.

While processing the following module(s):
    Module fullname      Module Filename
    ---------------      ---------------
    stack-openmpi/4.1.4  /apps/contrib/spack-stack/spack-stack-1.8.0/envs/ue-gcc-12.2.0/install/modulefiles/gcc/12.2.0/stack-openmpi/4.1.4.lua

Testing

Tested a modified modulefile as part of spack-stack-1.8.0 installation on MacOS, errors are not produced when unloading or reloading, during the development work. No errors reported when the module is loaded single time during the actual production runs or workflow.

Applications affected

All UFS applications that use openmpi and may do development work involving multiple loading/unloading of the openmpi file

Systems affected

Any system that uses openmpi. Example to reproduce errors on Hera:

module use /contrib/spack-stack/spack-stack-1.8.0/envs/mapl-2.40.3-gcc-9.2.0/install/modulefiles/Core
module load stack-gcc
module load stack-openmpi
module load stack-openmpi

After loading the stack-openmpi for the second time, the following error appears:

Lmod Warning:  Syntax error in file:
/contrib/spack-stack/spack-stack-1.8.0/envs/mapl-2.40.3-gcc-9.2.0/install/modulefiles/gcc/9.2.0/stack-openmpi/4.1.6.lua
 with command: setenv, one or more arguments are not strings.

While processing the following module(s):
    Module fullname      Module Filename
    ---------------      ---------------
    stack-openmpi/4.1.6  /contrib/spack-stack/spack-stack-1.8.0/envs/mapl-2.40.3-gcc-9.2.0/install/modulefiles/gcc/9.2.0/stack-openmpi/4.1.6.lua

(this message is repeated 5 times)

Attempt to unload the module

module unload stack-openmpi

produces the same error messages. The modules however are successfully unloaded.

Dependencies

Checklist

  • This PR addresses one issue/problem/enhancement, or has a very good reason for not doing so.
  • These changes have been tested on the affected systems and applications.
  • All dependency PRs/issues have been resolved and this PR can be merged.

Contributors

@DavidHuber-NOAA - suggesting a modulefile fix for a similar issue in PR ufs-community/ufs-weather-model#2551

setenv("I_MPI_FC", os.getenv("FC"))
local i_mpi_cc = os.getenv("CC")
local i_mpi_cxx = os.getenv("CXX")
local i_mpi_f77 = os.getenv("F77") or ""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why only for f77 and not for all?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if the F77 is defined in the first place; it may not be needed and possibly is left for the legacy reasons.
i_mpi_f77 is therefore not used in the conditional statement further below in line 41.

local i_mpi_f90 = os.getenv("FC")
local i_mpi_fc = os.getenv("FC")

if i_mpi_cc and i_mpi_cxx and i_mpi_f90 and i_mpi_fc then
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an instance where these aren't all getting set? Isn't all or nothing based on compilers.lua? I might not under the problem exactly, so if you could add steps to reproduce in the issue description that would be appreciated.

@RatkoVasic-NOAA
Copy link
Collaborator

I can confirm Natalie's finds on Orion and Hera when using GNU stack for spack-stack > 1.7.0

Copy link
Collaborator

@climbfuji climbfuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about this PR. If you look at https://github.com/JCSDA/spack-stack/blob/develop/spack-ext/lib/jcsda-emc/spack-stack/stack/templates/compiler.lua, you see that all four environmental variables are defined: CC, CXX, FC, F77. Since the compiler meta-module is always loaded before the MPI meta-module, and therefore alI four variables are set, I am wondering what the effect of the changes proposed here is.

@climbfuji
Copy link
Collaborator

I am not sure about this PR. If you look at https://github.com/JCSDA/spack-stack/blob/develop/spack-ext/lib/jcsda-emc/spack-stack/stack/templates/compiler.lua, you see that all four environmental variables are defined: CC, CXX, FC, F77. Since the compiler meta-module is always loaded before the MPI meta-module, and therefore alI four variables are set, I am wondering what the effect of the changes proposed here is.

I am also wondering why this is only a problem with OpenMPI?

@natalie-perlin
Copy link
Collaborator Author

I am not sure about this PR. If you look at https://github.com/JCSDA/spack-stack/blob/develop/spack-ext/lib/jcsda-emc/spack-stack/stack/templates/compiler.lua, you see that all four environmental variables are defined: CC, CXX, FC, F77. Since the compiler meta-module is always loaded before the MPI meta-module, and therefore alI four variables are set, I am wondering what the effect of the changes proposed here is.

I am also wondering why this is only a problem with OpenMPI?

The loading of the modules is done inside the another meta-module, i.e. ./stack-openmpi/4.1.6.lua contains the following:

-- prerequisite modules
load("gnu/9.2.0")
load("openmpi/4.1.6_gnu9.2.0")

Unlike in a bash shell, where the execution is done consecutively, and so the env. variables would be available immediately after the "load module/A" is executed, Lmod modulefiles work differently. They are evaluated first (the entire modulefile *.lua), and executed on the second pass. During the first pass, the env. variables from gnu/9.2.0 module are not yet available, and when the following lines are evaluated, they produce the error, because setenv syntax cannot have a zero string.

-- intel specific mpi wrapper environment variables
setenv("I_MPI_CC",  os.getenv("CC"))
setenv("I_MPI_CXX", os.getenv("CXX"))
setenv("I_MPI_F77", os.getenv("F77"))
setenv("I_MPI_F90", os.getenv("FC"))
setenv("I_MPI_FC",  os.getenv("FC"))

@climbfuji
Copy link
Collaborator

Ok, slowly I am beginning to understand. But why is this a problem only when you load or unload the module twice?

@natalie-perlin
Copy link
Collaborator Author

Ok, slowly I am beginning to understand. But why is this a problem only when you load or unload the module twice?

This is not yet clear... It is likely this issue is not only with gnu-based openmpi modulefiles, but any meta-modules that use the same template ./mpi.lua

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

4 participants