March 31, 2020 - Jules Kouatchou #27

JulesKouatchou · 2020-03-31T19:24:48Z

ctm_setup:

New setting of environment variables (SETENV.commands) for the use of MPT

ctm_run.j:

The @SETENVS tag was placed before the MPI run command is issued.
Checked the exist status of the executable using the EGRESS file.

ctm_setup: - New setting of environment variables (SETENV.commands) for the use of MPT ctm_run.j: - The @SETENVS tag was placed before the MPI run command is issued. - Checked the exist status of the executable using the EGRESS file.

mmanyin · 2020-03-31T20:50:28Z

@mathomp4 There was a build error; "No configuration was found in your project. Please refer to https://circleci.com/docs/2.0/ to get started with your configuration." Is the CI stuff working properly for CTM ? thanks

mmanyin · 2020-03-31T20:58:08Z

@JulesKouatchou Do I understand correctly, you have changed the run-time environment variables to be appropriate for using MPT? How do we compile w/ MPT? I thought that the default was to compile with Intel-MPI.

mathomp4 · 2020-03-31T23:47:25Z

@mathomp4 There was a build error; "No configuration was found in your project. Please refer to https://circleci.com/docs/2.0/ to get started with your configuration." Is the CI stuff working properly for CTM ? thanks

@mmanyin Until #23 is merged in, there is no way for CircleCI to find a configuration since it only exists on a branch, not master. I had set up CircleCI to follow GEOSctm thinking the config file would get it. Since it might be a while, would you like me to turn off CircleCI following GEOSctm?

mmanyin · 2020-04-01T12:25:37Z

@mathomp4 There was a build error; "No configuration was found in your project. Please refer to https://circleci.com/docs/2.0/ to get started with your configuration." Is the CI stuff working properly for CTM ? thanks

@mmanyin Until #23 is merged in, there is no way for CircleCI to find a configuration since it only exists on a branch, not master. I had set up CircleCI to follow GEOSctm thinking the config file would get it. Since it might be a while, would you like me to turn off CircleCI following GEOSctm?

Actually I will go ahead with #23 . Sorry for the confusion!

mathomp4 · 2020-04-03T14:21:05Z

Well, that was unexpected.

@JulesKouatchou When you have a chance can you do a fresh clone of GEOSctm and then a fresh checkout of your branch, and then try running with MPT?

I just did a "resolve conflict" for your branch (so it could merge in) and, weirdly, Git seems to say now that the ctm_setup now isn't "new". I mean, it seems to have all the right bits for MAPL 2 on MPT, but...weird.

On the plus side, @mmanyin, it looks like that "resolve conflict" is letting CircleCI run!

JulesKouatchou · 2020-04-03T14:29:08Z

@mathomp4 I will and let you know.

JulesKouatchou · 2020-04-03T15:01:06Z

@mathomp4 When I dod:

  git clone [email protected]:GEOS-ESM/GEOSctm.git
 cd GEOSctm/
 git checkout -b jkGEOSctm_on_SLESS12
 checkout_externals
 source @env/g5_modules

Intel MPI get loaded. I need MPT.

mathomp4 · 2020-04-03T16:05:29Z

@mathomp4 When I dod:

  git clone [email protected]:GEOS-ESM/GEOSctm.git
 cd GEOSctm/
 git checkout -b jkGEOSctm_on_SLESS12
 checkout_externals
 source @env/g5_modules

Intel MPI get loaded. I need MPT.

Jules,

You'll need to:

cp /gpfsm/dhome/mathomp4/GitG5Modules/SLES12/6.0.4/g5_modules.intel1805.mpt217 @env/g5_modules

to get MPT as an MPI stack

JulesKouatchou · 2020-04-03T18:45:17Z

@mathomp4 Here are my steps:

git clone [email protected]:GEOS-ESM/GEOSctm.git

cd GEOSctm
git checkout jkGEOSctm_on_SLESS12
checkout_externals
cp /gpfsm/dhome/mathomp4/GitG5Modules/SLES12/6.0.4/g5_modules.intel1805.mpt217
@env/g5_modules
source @env/g5_modules

Things appear to be fine. I am currently doing a long run to make sure that the code does not crash.

Thanks.

mathomp4 · 2020-04-03T18:46:48Z

Sounds good! If all works, you can set the appropriate "required label". I'm guessing 0-diff is good enough since your changes can't change results, right?

JulesKouatchou · 2020-04-03T19:03:26Z

@mathomp4 This is the first step. I want to code to be able to compile and run. Ideally, I want the same code to compile and run on SLESS11 nodes too (though they will disappear soon). I will then be able to do the comparison.

JulesKouatchou · 2020-04-03T20:07:15Z

@mathomp4 My long run did not have any issue.
You asked me to copy the file g5_modules.intel1805.mpt217. Is it possible to make it part of the repository? I want MPT module to be the default for the CTM.

mathomp4 · 2020-04-04T00:56:19Z

Jules,

We can do that for sure, but then when the hundreds of Skylake nodes go online for the general user they will not be able to use them. Intel MPI allows users to use every node on NCCS.

Before we issue that, ctm_run.j should be altered so that if anyone ever tries to run on the Skylakes at NCCS with MPT, the CTM must immediately error out with a non-zero status code. And maybe a note saying what’s happening so that the user doesn’t try to contact NCCS or the SI Team. I mean, the job will crash anyway, but it will be an obscure looking loader error I think.

JulesKouatchou · 2020-04-07T19:43:20Z

@mathomp4 Sorry that I coming back to it now. Wondering if there could be (for now) a flag that set MPT as first option and Intel MPI as the second option. I am willing to modify the ctm_run.j file if I know what options are available in g5_modules.

mathomp4 · 2020-04-08T13:08:20Z

@mathomp4 Sorry that I coming back to it now. Wondering if there could be (for now) a flag that set MPT as first option and Intel MPI as the second option. I am willing to modify the ctm_run.j file if I know what options are available in g5_modules.

@JulesKouatchou I don't think so, not as long as GEOS uses g5_modules. The issue is that it is a script that is run and a file that is sourced. This severely limits its flexibility because you can break it very easily (for example, you can not do source g5_modules -option).

If you require MPT, I can create a special branch/tag of ESMA_env for you.

You should also contact NCCS and let them know that Intel MPI does not work for your code. They will be interested in this and would probably want to contact Intel regarding the fault.

mmanyin · 2020-04-08T14:11:32Z

@mathomp4 Sorry that I coming back to it now. Wondering if there could be (for now) a flag that set MPT as first option and Intel MPI as the second option. I am willing to modify the ctm_run.j file if I know what options are available in g5_modules.

@JulesKouatchou I don't think so, not as long as GEOS uses g5_modules. The issue is that it is a script that is run and a file that is sourced. This severely limits its flexibility because you can break it very easily (for example, you can not do source g5_modules -option).

If you require MPT, I can create a special branch/tag of ESMA_env for you.

You should also contact NCCS and let them know that Intel MPI does not work for your code. They will be interested in this and would probably want to contact Intel regarding the fault.

I have seen Intel MPI crash during Finalize, when running the GCM under SLES12. @JulesKouatchou please CC me when you contact NCCS about this problem; I will open a case as well, and CC you.

JulesKouatchou · 2020-04-13T14:47:29Z

@mathomp4 @mmanyin I have tried to build the simplest test case possible (using Intel MPI on SLESS12 nodes) where the code does not exist gracefully. So far I have not duplicated the problem with a purely MPI program and a ESMF program. I now want to try a code that uses MAPL.

mathomp4 · 2020-04-14T18:01:40Z

@JulesKouatchou We might have a workaround for the MPI_Finalize issue. I found an MPI command which essentially "turns off error output" and @bena-nasa seemed to be able to show it helped.

We are looking at adding it into MAPL with some good protections so we don't turn off all MPI errors.

JulesKouatchou · 2020-04-15T20:26:50Z

@mathomp4 Great! Let me know when the workaround is ready so that I can test it.

mathomp4 · 2020-04-15T21:01:53Z

@mathomp4 Great! Let me know when the workaround is ready so that I can test it.

Jules, try out MAPL v2.0.6 (aka git checkout v2.0.6 in MAPL)

Note, you're behind on a lot of things in CTM in it's mepo/externals bits) but v2.0.0 and v2.0.6 are still similar.

JulesKouatchou · 2020-04-17T15:30:14Z

@mathomp4 Here is a summary of what happened when I used MAPL v2.0.6.

I used the modules comp/intel/18.0.5.274 and mpi/impi/19.1.0.166.
GEOS CTM exited gracefully while doing short runs (few days).
GEOS CTM abruptly crashed after about 15 days of integration. The error message is:

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 25 PID 3877 RUNNING AT borgo007
= KILLED BY SIGNAL: 9 (Killed)

It seems that MPT might be the option (for now) for the CTM.

mathomp4 · 2020-04-17T15:39:22Z

@JulesKouatchou Well that's annoying. Can you point me to the output so I can look at the errors?

Also, if you can, can you try one more test? It would be interesting to see if MAPL 2.1 helps at all. Plus you can be the first to try the CTM with it.

For that, you'll want to clone a new CTM somewhere rather than re-use the current one. Then after cloning and doing the mepo/checkout_externals update:

@env to v3.0.0
@mapl to v2.1.0
@cmake to v2.1.0

JulesKouatchou · 2020-04-17T15:46:15Z

@mathomp4 Will do and let you know.

JulesKouatchou · 2020-04-17T15:58:21Z

@mathomp4 I could not checkout @env v3.0.0:

error: pathspec 'v3.0.0' did not match any file(s) known to git

I am currently in v2.0.2.

mathomp4 · 2020-04-17T16:10:38Z

Sigh. I’m an idiot. @env is v2.1.0 and @cmake is v3.0.0.

Sorry about that. MAPL is, of course, v2.1.0

JulesKouatchou · 2020-04-17T16:50:47Z

@mathomp4 Here is another issue:

-- Found MKL: /usr/local/intel/2018/compilers_and_libraries_2018.5.274/linux/mkl/lib/intel64/libmkl_intel
_lp64.so;/usr/local/intel/2018/compilers_and_libraries_2018.5.274/linux/mkl/lib/intel64/libmkl_sequential
.so;/usr/local/intel/2018/compilers_and_libraries_2018.5.274/linux/mkl/lib/intel64/libmkl_core.so;-pthrea
d
-- Found Python: /usr/bin/python3.4 (found version "3.4.6") found components: Interpreter

-- [GEOSctm] (1.0) [f278c74]

-- [MAPL] (2.1.0) [e23f20a]
-- Found Perl: /usr/bin/perl (found version "5.18.2")
CMake Error at src/Shared/@MAPL/GMAO_pFIO/tests/CMakeLists.txt:68 (string):
string sub-command REPLACE requires at least four arguments.

-- Found PythonInterp: /usr/local/other/python/GEOSpyD/2019.10_py2.7/2020-01-15/bin/python (found version
"2.7.16")
-- Configuring incomplete, errors occurred!
See also "/discover/nobackup/jkouatch/GEOS_CTM/GitRepos/MAPL2.1/GEOSctm/build/CMakeFiles/CMakeOutput.log"
.
See also "/discover/nobackup/jkouatch/GEOS_CTM/GitRepos/MAPL2.1/GEOSctm/build/CMakeFiles/CMakeError.log".

mathomp4 · 2020-04-17T16:57:44Z

@JulesKouatchou I think your @cmake is at v2.1.0. That's the one that needs to be at @v3.0.0

JulesKouatchou · 2020-04-18T19:06:02Z

@mathomp4 I used the following:

@env v2.1.0
@cmake v3.0.0
@mapl v2.1.0

and got the same error message after about 15 days of integration.

My code is at:
/gpfsm/dnb32/jkouatch/GEOS_CTM/GitRepos/MAPL2.1/GEOSctm

and my experiment directory at:
/gpfsm/dnb32/jkouatch/GEOS_CTM/GitRepos/MAPL2.1/IdealPT

mathomp4 · 2020-04-20T13:01:48Z

@JulesKouatchou Actually, I forgot you were splitting errors. The real error was in the .e file.

I might have a different thing for you to try. You seem to have hit an error others sometimes do on the Haswells. Intel provided some other advice:

Please try to tune maximal virtual size of “shm-heap” by I_MPI_SHM_HEAP_VSIZE ( https://software.intel.com/en-us/mpi-developer-reference-linux-other-environment-variables )

For example, try to set I_MPI_SHM_HEAP_VSIZE=4096 (it set 4096 MB per rank for virtual size of “shm-heap”). If it will works fine please try to decrease the size to for example I_MPI_SHM_HEAP_VSIZE=2048 and so on (1024, 512, 256, ..).

Please find and tell us the minimum size of I_MPI_SHM_HEAP_VSIZE when the program works fine. We can increase default value of the I_MPI_SHM_HEAP_VSIZE in the future Intel MPI release.

mathomp4 · 2020-04-20T13:02:22Z

Note if you don't have time to run these tests, let me know and I can work with Ben or someone on and we can quickly try them all out.

JulesKouatchou · 2020-04-20T14:28:04Z

@mathomp4 I will run the tests and let you know.

mathomp4 · 2020-04-20T14:32:45Z

Thanks. Note I found a bug with MAPL and MPT today so even moving to MPT might take a fix. Go me!

JulesKouatchou · 2020-04-20T17:21:36Z

@mathomp4 Conducting one 4-month run (I_MPI_SHM_HEAP_VSIZE=4096). So far at the end of the first month and still going. That is a great news as I was not able to pass 15 days of integration before.

mathomp4 · 2020-04-20T17:49:21Z

@mathomp4 Conducting one 4-month run (I_MPI_SHM_HEAP_VSIZE=4096). So far at the end of the first month and still going. That is a great news as I was not able to pass 15 days of integration before.

Good to hear!

As Intel said, if you can try lowering that in halves? The larger that is, the more memory Intel MPI reserves per-process, so we want the smallest value that works for you.

JulesKouatchou · 2020-04-21T13:32:24Z

@mathomp4 So far the setting of I_MPI_SHM_HEAP_VSIZ with 4096, 2048, 1024, 512 and 256 are working. I will soon start testing with 128.

mathomp4 · 2020-04-21T13:51:21Z

@JulesKouatchou Thanks for doing this. Now my fear is that it'll work with I_MPI_SHM_HEAP_VSIZE=1 which would mean something a bit fundamental.

But you've already lowered it a lot which is nice.

JulesKouatchou · 2020-04-21T16:37:14Z

@mathomp4 Unfortunately, the lowest setting might be I_MPI_SHM_HEAP_VSIZE=512. The run with 256 crashed (same error message as before) after 2 months and 27 days of integration.

mathomp4 · 2020-04-22T16:57:36Z

@mathomp4 Unfortunately, the lowest setting might be I_MPI_SHM_HEAP_VSIZE=512. The run with 256 crashed (same error message as before) after 2 months and 27 days of integration.

Still, that is good to know. I'll pass it on to Scott to test and to Intel.

mathomp4 · 2020-04-22T17:25:15Z

I suppose you could integrate that into ctm_setup or run or wherever. That way it's on by default for you. I might do the same in GCM.

- Modified the ctm_run.j file to allow the transition from 2010-2019 into 2020-2029 when dealing with MERRA2 forcing data. **Executable Exit Status** - Modified the ctm_run.j file to check the exit status of the code through the existence of the EGRESS file. **Intel MPI Environment Setting** - Modified the ctm_setup file to set I_MPI_SHM_HEAP_VSIZE to 512 when Intel MPI is used. - It will be used in ctm_run.j. - The environment variable is required to prevent the code from crashing. - The value of 512 might be increased and/or other environment variables might be set. **Convection Refactoring** Refactored the Convection component: - RAS calculations are now done in the CTM CC that now provides convective mass fluxes (read in or calculated) to any component that needs them. - AK, BK, LONS and LATS are obtained in CTM CC to carry out RAS calculations. - Convection is always turned on regardless of the Chemistry configuration. However, if no tracer is FRIENDLY to MOIST, then Convection will be automatically turned off. - Removed the files CTM_ConvectionStubCompMod.F90 and GenericConvectionMethod_mod.F90 that are no longer needed. - The file GmiConvectionMethod_mod.F90 will remain until we figure out how to handle (feeding back calculation to GMI Deposition) GMI convective updrafts. - For now, Convection only does convective transport for any Chemistry configuration. **Refactoring of CTM CC** - Introduced an option (flag read_advCoreFields) to import CX/CY/MX/MY and not compute them. They are then passed to AdvCore. - Removed the import of PLE - Removed references to SGH - Changed the settings so that we read PS to compute PLE (using AK and BK). - Renamed fields that are outputs and made sure that output fields are the same as the ones used for calculations. **MERRA2 Template File** - Removed references of the variables SGH, PLE, DELP - Added PS0, PS1 (to compute inside the code PLE0 and PLE1) - PS is nor coming from files with instantaneous values (not time average ones). - Changed the climatology file (old one did not have proper records).

tclune

Cmake much improved.

JulesKouatchou · 2020-04-27T15:30:40Z

@mathomp4 I included the I_MPI_SHM_HEAP_VSIZE=512 setting on my CTM branch jkGEOSctm_on_SLESS12. I did several "long" tests with Intel MPI to confirm that the code not longer crashes and exits gracefully.

mathomp4 · 2020-04-27T15:40:15Z

@mathomp4 I included the I_MPI_SHM_HEAP_VSIZE=512 setting on my CTM branch jkGEOSctm_on_SLESS12. I did several "long" tests with Intel MPI to confirm that the code not longer crashes and exits gracefully.

@JulesKouatchou Thanks for moving that @SETENVS as it was in the wrong place. If you can, you might want to add two more that the GCM is now running with by default:

setenv I_MPI_ADJUST_ALLREDUCE 12
setenv I_MPI_ADJUST_GATHERV 3

I think these are more important on the Skylakes, but GCM will be running with them for Intel MPI everywhere. The first fixes an issue at high-resolution for Bill, so you might never see it in a CTM, but the second one fixes an issue Ben was able to trigger at C180 at 8x48 which isn't that enormous.

I know the GCM (for all our testing) is zero-diff with them. I have to imagine the CTM would be as well, but I don't know how to test.

But that can also be a second PR if you like that I can make after you get this in?

JulesKouatchou · 2020-04-27T16:07:33Z

@mathomp4 Thank you for the new settings. I want to have something that works on SLESS12 first before doing internal CTM tests.

weiyuan-jiang · 2020-04-27T17:57:02Z

Hi, Jules, Do you mean with this env “I_MPI_SHM_HEAP_VSIZE=512” , there won’t be MPI_Finalize Failure? Thanks Weiyuan From: JulesKouatchou <[email protected]> Reply-To: GEOS-ESM/GEOSctm <[email protected]> Date: Monday, April 27, 2020 at 11:31 AM To: GEOS-ESM/GEOSctm <[email protected]> Cc: "Jiang, Weiyuan (GSFC-610.1)[SCIENCE SYSTEMS AND APPLICATIONS INC]" <[email protected]>, Review requested <[email protected]> Subject: [EXTERNAL] Re: [GEOS-ESM/GEOSctm] March 31, 2020 - Jules Kouatchou (#27) @mathomp4<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_mathomp4&d=DwMCaQ&c=ApwzowJNAKKw3xye91w7BE1XMRKi2LN9kiMk5Csz9Zk&r=SZHGQvwk5qYXensoM4g4Fr_aEfq5rxS_qQY1paompMc&m=2fllAmQ-PT5mNz9FpgtmpR7_6dP-rlhj9H_BzKRRze4&s=PKQX1ryk9IyVJ-Dzh5wAmjXeW3ppTh9UkWONlDhqkZM&e=> I included the I_MPI_SHM_HEAP_VSIZE=512 setting on my CTM branch jkGEOSctm_on_SLESS12. I did several "long" tests with Intel MPI to confirm that the code not longer crashes and exits gracefully. — You are receiving this because your review was requested. Reply to this email directly, view it on GitHub<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_GEOS-2DESM_GEOSctm_pull_27-23issuecomment-2D620059451&d=DwMCaQ&c=ApwzowJNAKKw3xye91w7BE1XMRKi2LN9kiMk5Csz9Zk&r=SZHGQvwk5qYXensoM4g4Fr_aEfq5rxS_qQY1paompMc&m=2fllAmQ-PT5mNz9FpgtmpR7_6dP-rlhj9H_BzKRRze4&s=Fwv1vo2GuEEY4SPyWR65cOENytizVUtu7MfcVATaQIY&e=>, or unsubscribe<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AMQTYOIARZI4BALKEJ44SF3ROWQLDANCNFSM4LX5ASJQ&d=DwMCaQ&c=ApwzowJNAKKw3xye91w7BE1XMRKi2LN9kiMk5Csz9Zk&r=SZHGQvwk5qYXensoM4g4Fr_aEfq5rxS_qQY1paompMc&m=2fllAmQ-PT5mNz9FpgtmpR7_6dP-rlhj9H_BzKRRze4&s=_QHYswlbTYZ2-jGZFwNlYos1xZ9XYHHVAsVFfXA7HOg&e=>.

mathomp4 · 2020-04-27T19:30:40Z

@weiyuan-jiang I think the I_MPI_SHM_HEAP_VSIZE variable helps with the "unexpected failures" in the runs. The MPI_Finalize issues should be taken care of with newer MAPL with the workaround we did in MAPL_Cap.

Note that Scott is currently testing the GCM with I_MPI_SHM_HEAP_VSIZE. For him it's looking like anything other than zero is what's needed, but we might go with I_MPI_SHM_HEAP_VSIZE=512 since @JulesKouatchou found actual proof it's a useful number.

I've asked NCCS about their thoughts on it (note: this value is probably only needed on Haswell, so I'll probably code up the GCM's scripts to apply it only if Intel MPI + Haswell).

mathomp4 · 2020-04-27T19:45:04Z

Also, Bill Putman has, I think, four other variable he uses at night for his runs. I think three of them might be considered "generally useful" but I'm waiting for NCCS to respond before I add them to the GCM. If they are, I'll pass them along here as well.

- Introduced the flag do_ctmAdvection that is by default set to true. When it is set to false, the Advection run method is not called. - Introduced an Internal State in the CTM parent gridded components. All the flags and other variables that were local module variables are now part of the internal state. This makes the code thread safe.

- do_ctmAdvection is by default set to TRUE before even it is read in - Added the calls of A2D2C that was not initially captured - Added a printout of the settings during initializations. - Reorganized the section where the courant numbers and mass fluxes are computed to allocate variables only when it is necessary.

mmanyin · 2020-05-27T21:23:53Z

@JulesKouatchou Could you please update the components.yaml and Externals.cfg to reflect the versions of the repo's that you are satisfied with? (See your comment from April 18 above) Also, do we still need to use MPT to prevent crashing?

JulesKouatchou · 2020-05-29T16:57:32Z

@mmanyin The last experiments that I did was about two weeks ago. I did several long runs and noticed that the code crashed after about 165 days of integration (in one job segment) even by increasing the value of I_MPI_SHM_HEAP_VSIZE. @mathomp4 mentioned that Bill is using other settings that we need to include too.
In another matter, the code is still not exiting gracefully when I use Intel MPI.

Do you want be to add the versions below as default for the CTM?

@env v2.1.0
@cmake v3.0.0
@mapl v2.1.0

mathomp4 · 2020-05-29T19:22:39Z

@mmanyin The last experiments that I did was about two weeks ago. I did several long runs and noticed that the code crashed after about 165 days of integration (in one job segment) even by increasing the value of I_MPI_SHM_HEAP_VSIZE. @mathomp4 mentioned that Bill is using other settings that we need to include too.
In another matter, the code is still not exiting gracefully when I use Intel MPI.

Do you want be to add the versions below as default for the CTM?

@env v2.1.0
@cmake v3.0.0
@mapl v2.1.0

I think the graceful exit is probably due to not new enough MAPL. We fixed that we think in 2.1.3. The GCM is currently using (in master, not yet in release):

ESMA_env v2.1.5
ESMA_cmake v3.0.3
MAPL v2.1.4

The other Bill flags probably won't help much. He has some that I think only affect high res runs. The important ones are the I_MPI_ADJUST_ALLREDUCE, I_MPI_ADJUST_GATHERV, and the I_MPI_SHM_HEAP_VSIZE we think.

JulesKouatchou · 2020-06-01T13:36:53Z

@mathomp4 and @mmanyin I used:

ESMA_env v2.1.5
ESMA_cmake v3.0.3
MAPL v2.1.4

and also the settings I_MPI_ADJUST_ALLREDUCE, I_MPI_ADJUST_GATHERV, and I_MPI_SHM_HEAP_VSIZE (512, 1024, 2048). The code exited gracefully but still crashed at the same integration date regardless the value of I_MPI_SHM_HEAP_VSIZE.

@cmake

versions of: - @cmake v3.0.3 - @env v2.1.5 - @mapl v2.1.4

March 31, 2020 - Jules Kouatchou

3d1fb32

ctm_setup: - New setting of environment variables (SETENV.commands) for the use of MPT ctm_run.j: - The @SETENVS tag was placed before the MPI run command is issued. - Checked the exist status of the executable using the EGRESS file.

Merge branch 'master' into jkGEOSctm_on_SLESS12

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23
Expired

Verified
Learn about vigilant mode

464de8d

JulesKouatchou requested a review from a team as a code owner April 27, 2020 14:29

tclune previously approved these changes Apr 27, 2020

View reviewed changes

Additional environment variable settings when Intel MPI is used.

5795f93

JulesKouatchou dismissed tclune’s stale review via 5795f93 April 27, 2020 16:04

JulesKouatchou added 2 commits April 28, 2020 15:21

Edited the files Externals.cfg and components.yaml to set newer

448e763

versions of: - @cmake v3.0.3 - @env v2.1.5 - @mapl v2.1.4

mathomp4 changed the base branch from master to main June 22, 2020 18:18

March 31, 2020 - Jules Kouatchou #27

Are you sure you want to change the base?

March 31, 2020 - Jules Kouatchou #27

Conversation

JulesKouatchou commented Mar 31, 2020

mmanyin commented Mar 31, 2020

mmanyin commented Mar 31, 2020

mathomp4 commented Mar 31, 2020

mmanyin commented Apr 1, 2020

mathomp4 commented Apr 3, 2020

JulesKouatchou commented Apr 3, 2020

JulesKouatchou commented Apr 3, 2020

mathomp4 commented Apr 3, 2020

JulesKouatchou commented Apr 3, 2020

mathomp4 commented Apr 3, 2020

JulesKouatchou commented Apr 3, 2020

JulesKouatchou commented Apr 3, 2020

mathomp4 commented Apr 4, 2020

JulesKouatchou commented Apr 7, 2020

mathomp4 commented Apr 8, 2020

mmanyin commented Apr 8, 2020

JulesKouatchou commented Apr 13, 2020

mathomp4 commented Apr 14, 2020

JulesKouatchou commented Apr 15, 2020

mathomp4 commented Apr 15, 2020

JulesKouatchou commented Apr 17, 2020

=================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = RANK 25 PID 3877 RUNNING AT borgo007 = KILLED BY SIGNAL: 9 (Killed)

mathomp4 commented Apr 17, 2020

JulesKouatchou commented Apr 17, 2020

JulesKouatchou commented Apr 17, 2020

mathomp4 commented Apr 17, 2020

JulesKouatchou commented Apr 17, 2020

mathomp4 commented Apr 17, 2020

JulesKouatchou commented Apr 18, 2020

mathomp4 commented Apr 20, 2020

mathomp4 commented Apr 20, 2020

JulesKouatchou commented Apr 20, 2020

mathomp4 commented Apr 20, 2020

JulesKouatchou commented Apr 20, 2020

mathomp4 commented Apr 20, 2020

JulesKouatchou commented Apr 21, 2020

mathomp4 commented Apr 21, 2020

JulesKouatchou commented Apr 21, 2020

mathomp4 commented Apr 22, 2020

mathomp4 commented Apr 22, 2020

tclune left a comment

Choose a reason for hiding this comment

JulesKouatchou commented Apr 27, 2020

mathomp4 commented Apr 27, 2020

JulesKouatchou commented Apr 27, 2020

weiyuan-jiang commented Apr 27, 2020 via email

mathomp4 commented Apr 27, 2020

mathomp4 commented Apr 27, 2020

mmanyin commented May 27, 2020

JulesKouatchou commented May 29, 2020

mathomp4 commented May 29, 2020

JulesKouatchou commented Jun 1, 2020

===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= RANK 25 PID 3877 RUNNING AT borgo007
= KILLED BY SIGNAL: 9 (Killed)