Skip to content

Add OpenMPI host injection script #963

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 20 commits into
base: 2023.06-software.eessi.io
Choose a base branch
from

Conversation

pfermi
Copy link

@pfermi pfermi commented Mar 5, 2025

Bash script that given a path to an OpenMPI installation in the host, makes a copy of the OpenMPI libraries,
libfabric and libpmix and inject the necessary host libraries with the absolute path. In order to know which libraries must be injected it uses ldd from EESSI (${EESSI_EPREFIX}/usr/bin/ldd) and the host (/usr/bin/ldd).

The script is organized in functions. The function inject_mpi is the one actually performing the injection. There is a download_patchelf function in charge of downloading version v0.17.2 of patchelf, because the patchelf shipped with EESSI could not do the injection successfully.

The initial injection happens in a temporary directory, that can be specified from the command line, but if not specified it uses mktemp -d. At the end the temporary directory is removed, unless --noclean is specified in command line execution of the script.

The script, by default, does not perform the injection if the directory .../host_injections/rpath_overrides/OpenMPI/system/lib is not empty, unless --force is given to the script.

Copy link

eessi-bot bot commented Mar 5, 2025

Instance eessi-bot-mc-aws is configured to build for:

  • architectures: x86_64/generic, x86_64/intel/haswell, x86_64/intel/sapphirerapids, x86_64/intel/skylake_avx512, x86_64/amd/zen2, x86_64/amd/zen3, aarch64/generic, aarch64/neoverse_n1, aarch64/neoverse_v1
  • repositories: eessi.io-2023.06-software, eessi.io-2023.06-compat

Copy link

eessi-bot bot commented Mar 5, 2025

Instance eessi-bot-mc-azure is configured to build for:

  • architectures: x86_64/amd/zen4
  • repositories: eessi.io-2023.06-compat, eessi.io-2023.06-software

@ocaisa
Copy link
Member

ocaisa commented Mar 11, 2025

@pfermi You should be able to create CI for this since you can install OpenMPI from the OS.

@TopRichard
Copy link
Collaborator

TopRichard commented Mar 20, 2025

Tested the script on OpenMPI-5.0.7 built with EasyBuild(draft easyconfig) using eessi_container and the following was noted:
As the result shows:

patchelf: open: Permission denied
patchelf: open: Permission denied
patchelf: open: Permission denied
patchelf: open: Permission denied
patchelf: open: Permission denied
MPI injection was successful
  1. The patchelf command failed to replace the needed libraries, and thus the MPI injections should not be successful
  2. The library files lack write mode, which can be fixed by chmod u+w for the library files within the tmp dir before trying to patchelf
  3. After applying the change mentioned above, the script injects the following in host_injections:
    ls host_injections/2023.06/software/linux/aarch64/generic/rpath_overrides/OpenMPI/system/lib/ libfabric.so libfabric.so.1 libfabric.so.1.26.0 libpmix.so libpmix.so.2 libpmix.so.2.8.0
    - should libmpi.so.40 be also injected?
    - some of the above are symbolic links :
libfabric.so -> libfabric.so.1.26.0
libfabric.so.1 -> libfabric.so.1.26.0
libfabric.so.1.26.0
libpmix.so -> libpmix.so.2.8.0
libpmix.so.2 -> libpmix.so.2.8.0
libpmix.so.2.8.0

@ocaisa
Copy link
Member

ocaisa commented Mar 20, 2025

@TopRichard Can you show the command line you used so it's clear where you pointed the script to? Indeed, the final contents seem to be missing critical libraries.

@pfermi The script should only apply patchelf if it is actually required (and indeed it will need write permissions on the copy). This sounds like a good second CI test for what I hope will be a common use case: someone builds a custom OpenMPI with EESSI as a base and then injects that. For CI, I would take a specific existing installation of OpenMPI from within EESSI and run the injection on that. In this case, if there is no need to patchelf anything, I would just create symlinks (this would also allow a rebuild to be automatically captured).

@TopRichard
Copy link
Collaborator

@TopRichard Can you show the command line you used so it's clear where you pointed the script to? Indeed, the final contents seem to be missing critical libraries.

{EESSI 2023.06} Apptainer> /cvmfs/software.eessi.io/versions/2023.06/scripts/mpi_support/install_openmpi_host_injection.sh -t /tmp/mpi --noclean --mpi-path /p/project1/ceasybuilders/MPI-rt/Extra/software/OpenMPI/5.0.7-GCC-12.3.0/

@pfermi The script should only apply patchelf if it is actually required (and indeed it will need write permissions on the copy). This sounds like a good second CI test for what I hope will be a common use case: someone builds a custom OpenMPI with EESSI as a base and then injects that. For CI, I would take a specific existing installation of OpenMPI from within EESSI and run the injection on that. In this case, if there is no need to patchelf anything, I would just create symlinks (this would also allow a rebuild to be automatically captured).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants