-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: support for exporting environment variables with parallel launchers #3207
Comments
I think that self.job.launcher.env_vars = {'MASTER_PORT': '1234'} or self.job.launcher.env_vars['MASTER_PORT] = '1234' is the best and matches the test's |
@casparvl I guess that your suggestion is for cases that the scheduler does not forward the environment to the compute nodes, right? If that's the case, maybe we would just need a configuration parameter to tell ReFrame to export the test's |
Hmm, that was actually not my problem. What you're talking about (I think) is exporting variables from the submission environment to the batch job. What I'm talking about is exporting the environment from the job's head-node of the allocation, to the processes launched by a parallel launcher like
And that, when configured with e.g.
But, when configured with
I.e. in that way, we can write tests that can tell the parallel launcher to export an environment variable to the parallel processes being launched without the test developer having to know which parallel launcher will be used. |
Thanks @casparvl for the clarification; it makes sense. I was confused at the beginning. |
Some software requires environment variables to run - e.g. PyTorch's distributed framework requires
MASTER_PORT
(among others) to be set. As discussed on Slack, this is currently challenging if the test developer doesn't know the configured launcher in advance.I.e. if we know that OpenMPI's
mpirun
will be the launcher, we can doBut if we are writing a test with the purpose of it being reused (e.g. a test for the
hpctestlib
), it would be nice to have a way of specifying this in a launcher-agnostic way. E.g.or
(the 2nd is probably more convenient, but not sure which API is easiest to support from the ReFrame side).
ReFrame would then abstract how each particular launcher exports environment variables. E.g. for OpenMPI, the ReFrame backend would add
-x MASTER_PORT=1234
as extra launcher argument, whereas forsrun
it would add--export=MASTERPORT=1234
.Note that right now, I worked around this issue by making a wrapper shell script that sets the environment variables, similar to what is used here by CSCS in their PyTorch test.
The text was updated successfully, but these errors were encountered: