Skip to content
This repository has been archived by the owner on Mar 20, 2023. It is now read-only.

Added hybrid MPI+OpenMP test in CI #299

Draft
wants to merge 11 commits into
base: master
Choose a base branch
from
Draft

Added hybrid MPI+OpenMP test in CI #299

wants to merge 11 commits into from

Conversation

iomaganaris
Copy link
Contributor

@pramodk I didn't manage to run NEURON with the threading option still with nrntraub. If you think that this should be tested maybe we can have a look together at some point.
Also, let me know if I should create a PR for my fork of nrntraub

@pramodk
Copy link
Collaborator

pramodk commented Apr 30, 2020

I think we should do that. Can you put here instructions with error message and tag Michale here?

@iomaganaris
Copy link
Contributor Author

Hello @nrnhines
We were trying to run the nrntraub test from https://github.com/pramodk/nrntraub/tree/icei with threading enabled in NEURON to launch CoreNEURON from NEURON and test OpenMP.
After cloning the repo I did the following:

nrnivmodl mod
srun -n 1 ./x86_64/special -c nthread=9 -mpi -c mytstop=100 -c use_coreneuron=0 init.hoc

Note that I am using 1 rank because pc.nthread gets set only if pc.nhost == 1 and I am setting use_coreneuron=0 for debugging in this case. With use_coreneuron=1 there is the same issue.
And I get the following error:

...
SetupTime: 4.8000002
mytstop  100
/gpfs/bbp.cscs.ch/project/proj16/magkanar/spack/software/install/linux-rhel7-x86_64/intel-19.0.4.243/neuron-develop-3csnze/x86_64/bin/nrniv: usable mindelay is 0 (or less than dt for fixed step method)
 in init.hoc near line 65
 prun()
       ^
        finitialize(-70)
      init()
    stdinit()
  prun()

I figured out that the issue comes from calling stdinit() from prun() in hoc/parlib.hoc.
I am using NEURON master and Intel compiler.
Could you help us with this issue?
Thank you very much in advance!

@nrnhines
Copy link
Collaborator

If you are using threads you cannot have any NetCon.delay = 0. (or less than dt). Of the 109982 NetCon, 265 of them have a delay of 0. Just to see if that is the problem try again with

diff --git a/hoc/parlib2.hoc b/hoc/parlib2.hoc
index d9eb164..1fbdee3 100755
--- a/hoc/parlib2.hoc
+++ b/hoc/parlib2.hoc
@@ -50,7 +50,7 @@ proc par_netstim_create() {local gid  localobj cell, syn, nc, ns, r
                netstims.append(ns)
                nc = new NetCon(ns.pp, syn)
                netstim_netcons.append(nc)
-               nc.delay = 0
+               nc.delay = 1
                r = new Random()
                r.negexp(1)
 //             r.Isaac64(netstim_random_seedoffset + netstim_base_)

For mpi and nthread=1 i is generally ok to have NetCon.delay=0 but only if they are not interprocessor NetCon (ie. source and target must be on same process).

@nrnhines
Copy link
Collaborator

By the way, I noticed another problem when launching python from within the nrntraub repository.

hines@hines-T7500:~/models/nrntraub-icei$ python
Python 3.7.6 (default, Feb 17 2020, 15:09:28) 
[GCC 7.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import neuron
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/hines/neuron/nrncmake/build/install/lib/python/neuron/__init__.py", line 132, in <module>
    import nrn
ModuleNotFoundError: No module named 'nrn'
>>> 

This seems to be an artifact of having a 'hoc' folder in the repository.

@iomaganaris
Copy link
Contributor Author

I got some time to work again on this test. Thank you very much for your suggestion @nrnhines to set nc.delay = 1. NEURON and CoreNEURON with threading worked with this.
I get however the following issues with threading enabled.
First, NEURON generates different spikes when the simulation runs with more that one thread and more than one mpi rank than when running the simulation with 1 mpi rank and multiple threads or multiple mpi ranks and no threading.
For example:

bash-4.2$ srun -n 1 ./x86_64/special -mpi -c use_coreneuron=0 -c nthread=36 -c mytstop=100 init.hoc
bash-4.2$ srun -n 4 ./x86_64/special -mpi -c use_coreneuron=0 -c nthread=9 -c mytstop=100 init.hoc
bash-4.2$ sort -n -k'1,1' -k2 < out1.dat | awk 'NR==1 { print; next } { printf "%.3f\t%d\n", $1, $2 }' > out1.sorted
bash-4.2$ sort -n -k'1,1' -k2 < out4.dat | awk 'NR==1 { print; next } { printf "%.3f\t%d\n", $1, $2 }' > out4.sorted
bash-4.2$ sdiff -s out1.sorted out4.sorted
10.375  186                                                   <
                                                              > 10.400  186
                                                              > 11.125  199
11.150  199                                                   <
                                                              > 12.950  220
12.975  220                                                   <
                                                              > 13.000  188
13.025  188                                                   <
13.025  264                                                   | 13.050  264
                                                              > 13.525  102
13.550  102                                                   <
13.675  288                                                   <
                                                              > 13.700  288
                                                              > 13.925  323
13.950  323                                                   <
14.275  312                                                   <
                                                              > 14.300  312
                                                              > 14.300  318
14.325  318                                                   | 14.350  87
14.350  192                                                   <
14.375  87                                                    <
...

During the first timesteps the spikes are the same but then there are these differences in the timesteps that the spikes are generated. In most cases the generated spikes differ by 1 timestep. Running NEURON with 36 MPI ranks and 1 thread generates the same spikes with 1 MPI rank and 36 threads.
The other issue is with the spikes generated by CoreNEURON. In all of the above cases CoreNEURON generates the same spikes with NEURON in the beginning but then after a timestep spikes start to shift in time. For example:

bash-4.2$ srun -n 4 ./x86_64/special -mpi -c use_coreneuron=1 -c nthread=9 -c mytstop=100 init.hoc
bash-4.2$ sort -n -k'1,1' -k2 < out.dat | awk 'NR==1 { print; next } { printf "%.3f\t%d\n", $1, $2 }' > out4.cn.sorted
bash-4.2$ sdiff -s out4.sorted out4.cn.sorted
bash-4.2$ sdiff -s out4.sorted out4.cn.sorted | more
                                                              > 5.900   160
                                                              > 6.050   176
                                                              > 6.050   180
6.750   160                                                   <
6.750   176                                                   <
6.825   180                                                   <
6.925   188                                                   <
                                                              > 6.950   188
6.975   168                                                   <
                                                              > 7.000   168
                                                              > 7.375   287
7.400   287                                                   <
                                                              > 7.550   290
7.575   290                                                   <
...

I am using my fork of nrntraub and the branch icei from here which includes the change in the delay and allows the selection of the number of threads when more than 1 MPI ranks are used.
Are the issues mentioned before related to the thread implementation or there is something going on with the test?
Any help would be greatly appreciated.

Thank you very much,
Ioannis

tests/jenkins/Jenkinsfile Outdated Show resolved Hide resolved
@pramodk
Copy link
Collaborator

pramodk commented Aug 16, 2020

@nrnhines : Similar to olfactory bulb model, do you think the above described issue might be with the model itself? In that case I will go ahead and use whatever baseline model provide with X mpi ranks and Y threads per mpi thread.

@nrnhines
Copy link
Collaborator

Discrepancies between NEURON and CoreNEURON in this situation are presumptively bugs. I assume there is no intra-NEURON or intra CoreNEURON differences on this time scale with different nhost and nthread.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable multi-threading tests (MPI+OpenMP) under CI
4 participants