-
-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ookami: MPI error opal_libevent2022_evthread_use_pthreads #835
Comments
There is something wrong with the system mpi binary. Doing |
Seems something may go indeed wrong then when trying to hook into system libs. Maybe look into https://github.com/giordano/julia-on-ookami as IIRC @giordano did quite some extensive testing and use of that machine. |
I'm a bit late to the party here, but I think I have a solution for your problem. In short, this isn't an issue in Now, how to address it. $ module load openmpi/gcc8/4.1.2
Loading openmpi/gcc8/4.1.2
Loading requirement: ucx/1.11.2
$ module load julia
$ julia --project -q
(openmpi) pkg> add MPIPreferences, MPI
Resolving package versions...
Updating `~/tmp/openmpi/Project.toml`
[da04e1cc] + MPI v0.20.22
[3da0fdf6] + MPIPreferences v0.1.11
[...]
julia> using MPIPreferences
julia> MPIPreferences.use_system_binary()
┌ Info: MPI implementation identified
│ libmpi = "libmpi"
│ version_string = "Open MPI v4.1.2rc4, package: Open MPI decarlson@fj100 Distribution, ident: 4.1.2rc4, repo rev: 2022-04-05, Unreleased developer copy\0"
│ impl = "OpenMPI"
│ version = v"4.1.2-rc4"
└ abi = "OpenMPI"
┌ Info: MPIPreferences changed
│ binary = "system"
│ libmpi = "libmpi"
│ abi = "OpenMPI"
│ mpiexec = "mpiexec"
│ preloads = Any[]
└ preloads_env_switch = nothing
julia> exit()
$ julia --project -q
julia> using MPI, Libdl
julia> filter(contains("libmpi"), dllist())
1-element Vector{String}:
"/lustre/software/openmpi/gcc8/4.1.2-rocky/lib/libmpi.so" We followed the documentation and (openmpi) pkg> add HDF5_jll
[...]
julia> using MPI, HDF5_jll, Libdl
julia> filter(contains("libmpi"), dllist())
2-element Vector{String}:
"/lustre/software/openmpi/gcc8/4.1.2-rocky/lib/libmpi.so"
"/lustre/home/mosgiordano/.julia" ⋯ 34 bytes ⋯ "43e725d0e5f9dfa0a/lib/libmpi.so" Now we have a problem: when loading both Luckily, for the case where system OpenMPI has the same ABI as the libmpi in [OpenMPI_jll]
libmpi_path = "/lustre/software/openmpi/gcc8/4.1.2-rocky/lib/libmpi.so"
mpiexec_path = "/lustre/software/openmpi/gcc8/4.1.2-rocky/bin/mpiexec" # probably not necessary, but we set this for good measure The paths above are the paths of julia> using MPI, HDF5_jll, Libdl
julia> filter(contains("libmpi"), dllist())
1-element Vector{String}:
"/lustre/software/openmpi/gcc8/4.1.2-rocky/lib/libmpi.so" Now when loading both I won't say this is a great solution, because it requires duplicating some information Hopefully, life will be better in a couple of years when the new standard MPI ABI will be more widespread, because facilities will quickly adopt it, right? |
This is what I did on Ookami after I logged on. (Some more details on https://iacs-group.slack.com/archives/C016DRQ321M.)
My setup of the environment looks like this:
julia -e 'using Pkg; Pkg.activate("."); Pkg.instantiate()'
julia --project=.
I also set up the system binary: MPIPreferences.use_system_binary().
srun -p short -n 10 --ntasks-per-node=1 --pty bash
cd FinEtoolsDDParallel.jl/examples/
~/a64fx/depot/bin/mpiexecjl -n 4 julia --project=. heat/Poisson2D_cg_mpi_driver.jl
After several minutes, the job was terminated. The error message is below.
Note well: On my laptop this example runs to completion in ~70 seconds.
The text was updated successfully, but these errors were encountered: