Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building Sire on Power9 architecture #320

Open
djcole56 opened this issue Aug 7, 2020 · 18 comments
Open

Building Sire on Power9 architecture #320

djcole56 opened this issue Aug 7, 2020 · 18 comments

Comments

@djcole56
Copy link

djcole56 commented Aug 7, 2020

Hi,

The N8CIR will shortly be purchasing several Power9 GPU nodes: https://n8cir.org.uk/supporting-research/facilities/nice/

We have access to a node in Newcastle at the moment, and I've managed to install OpenMM following the instructions here:
https://github.com/inspiremd/conda-recipes-summit#installing-on-summit

I've also started to have a look at building Sire, but have got stuck on compiling the corelib (errors below).

I can provide full build details, but just thought I'd check that what I'm trying is at all feasible?

Thanks,
Danny

(openmm) [ndc104@pn001 corelib]$ nice make -j 4
[ 1%] Built target test_qhash_lookup
[ 1%] Built target get_uname
[ 1%] Built target test_openmp
[ 1%] Built target get_glibc_version
[ 1%] Linking C executable get_cpuid
/mnt/nfs/home/ndc104/.conda/envs/openmm/pkgs/sire-2019.3.0/bundled/lib/libcpuid.so: error: undefined reference to 'busy_sse_loop'
/mnt/nfs/home/ndc104/.conda/envs/openmm/pkgs/sire-2019.3.0/bundled/lib/libcpuid.so: error: undefined reference to 'exec_cpuid'
/mnt/nfs/home/ndc104/.conda/envs/openmm/pkgs/sire-2019.3.0/bundled/lib/libcpuid.so: error: undefined reference to 'cpu_rdtsc'
collect2: error: ld returned 1 exit status
make[2]: *** [src/apps/test_system/get_cpuid] Error 1
make[1]: *** [src/apps/test_system/CMakeFiles/get_cpuid.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
[ 2%] Built target SireError
make: *** [all] Error 2

@lohedges
Copy link
Member

lohedges commented Aug 7, 2020

Hi there,

There have been some recent updates to Sire to enable builds on ppc64le architectures, see this pull request for details. I assume that this would work for building on ppc64 too. Specifically, there are updates to deal with getting CPU info where cpuid isn't supported:

corelib/src/libs/SireBase/cpuid.cpp: added support for getting the number of CPUs with native platform-specific methods in the absence of libcpuid

This was included in the recent 2020.1.0 release of Sire. Since it looks like you are using 2019.3.0, could you possibly try building using the development branch which will be up to date. (Remember to delete any existing ~/sire.app, build/corelib and build/wrapper directories and the build/miniconda.sh installer.) Also, are you building using the compile_sire.sh script? Above it looks like you are running the Makefile for corelib directly, but perhaps you are doing this to show the truncated error output.

Just to note that I haven't actually built Sire on ppc64le myself. The pull request was made by Cresset, so it would be interesting to know if it doesn't work on architectures other than those that they've tested it on. (I checked that it didn't break any of our existing builds for Linux and macOS.)

Cheers.

@djcole56
Copy link
Author

djcole56 commented Aug 7, 2020

Hi,

Thanks, this sounds promising. I'm not building using compile_sire.sh. I was following the instructions in INSTALL_INTO_ANACONDA.rst - I think because I wanted to install into my own conda distribution where I have openMM installed, ie:

cmake -D ANACONDA_BUILD=on -D ANACONDA_BASE=$HOME/.conda/envs/openmm $HOME/openmm/Sire/corelib
nice make -j 4

I'll keep playing, but unfortunately the first attempt gives a similar error:

(openmm) [ndc104@pn001 corelib]$ nice make -j 4
Scanning dependencies of target test_qhash_lookup
Scanning dependencies of target test_openmp
Scanning dependencies of target SireError
Scanning dependencies of target get_uname
[ 0%] Building C object src/apps/test_system/CMakeFiles/get_uname.dir/get_uname.c.o
[ 1%] Building CXX object build/test_compiler/test_qhash_lookup/CMakeFiles/test_qhash_lookup.dir/main.cpp.o
cc1: warning: command line option '-fvisibility-inlines-hidden' is valid for C++/ObjC++ but not for C
[ 1%] Building CXX object build/test_compiler/test_openmp/CMakeFiles/test_openmp.dir/main.cpp.o
[ 1%] Linking C executable get_uname
[ 1%] Built target get_uname
Scanning dependencies of target get_glibc_version
[ 1%] Building C object src/apps/test_system/CMakeFiles/get_glibc_version.dir/get_glibc_version.c.o
cc1: warning: command line option '-fvisibility-inlines-hidden' is valid for C++/ObjC++ but not for C
[ 1%] Linking C executable get_glibc_version
[ 1%] Built target get_glibc_version
Scanning dependencies of target get_cpuid
[ 2%] Building C object src/apps/test_system/CMakeFiles/get_cpuid.dir/get_cpuid.c.o
cc1: warning: command line option '-fvisibility-inlines-hidden' is valid for C++/ObjC++ but not for C
[ 2%] Linking C executable get_cpuid
/mnt/nfs/home/ndc104/.conda/envs/openmm/bin/../lib/gcc/powerpc64le-conda_cos7-linux-gnu/8.2.0/../../../../powerpc64le-conda_cos7-linux-gnu/bin/ld: /mnt/nfs/home/ndc104/.conda/envs/openmm/pkgs/sire-2020.1.0/bundled/lib/libcpuid.so: undefined reference to cpu_rdtsc' /mnt/nfs/home/ndc104/.conda/envs/openmm/bin/../lib/gcc/powerpc64le-conda_cos7-linux-gnu/8.2.0/../../../../powerpc64le-conda_cos7-linux-gnu/bin/ld: /mnt/nfs/home/ndc104/.conda/envs/openmm/pkgs/sire-2020.1.0/bundled/lib/libcpuid.so: undefined reference to busy_sse_loop'
/mnt/nfs/home/ndc104/.conda/envs/openmm/bin/../lib/gcc/powerpc64le-conda_cos7-linux-gnu/8.2.0/../../../../powerpc64le-conda_cos7-linux-gnu/bin/ld: /mnt/nfs/home/ndc104/.conda/envs/openmm/pkgs/sire-2020.1.0/bundled/lib/libcpuid.so: undefined reference to `exec_cpuid'
collect2: error: ld returned 1 exit status
make[2]: *** [src/apps/test_system/get_cpuid] Error 1
make[1]: *** [src/apps/test_system/CMakeFiles/get_cpuid.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....

@lohedges
Copy link
Member

lohedges commented Aug 7, 2020

Hmmm, I've not used the INSTALL_INTO_ANACONDA approach, and I'm not sure it's valid given the changes to the way we build Sire. (It's now a self-contained conda app with no external dependencies.) @chryswoods would have a better idea if this is still possible.

Using the standard installation approach (./compile_sire.sh) it's trivial to change the installed version of OpenMM after Sire is built. (Just use ~/sire.app/bin/conda install -c omnia openmm=....) We also have a bundled script accessible at ~/sire.app/bin/optimise_openmm which will try to figure out the most recent version that is compatible with your system, then install that for you.

Could you try the regular installation and see if that works? If not, then I can dig into it further.

@djcole56
Copy link
Author

djcole56 commented Aug 7, 2020

Oh I see, yep no problem. Just seems to be a handful of unavailable packages now. At first glance some of these seem to be hard to get hold of for ppc64le via conda:

(openmm) [ndc104@pn001 Sire]$ ./compile_sire.sh
Where would you like to install Sire? [/mnt/nfs/home/ndc104/sire.app]:
Installing into directory '/mnt/nfs/home/ndc104/sire.app'
** Running the conda activate script... **
** . "/mnt/nfs/home/ndc104/sire.app/bin/activate"
** Running the Python install script... **
** "/mnt/nfs/home/ndc104/sire.app/bin/python" build/build_sire.py **
Compiling on Linux
Number of cores used for compilation = 128
Continuing the Sire install using /mnt/nfs/home/ndc104/sire.app/bin/python build/build_sire.py
pip is already installed...
Activating conda-forge channel using: '/mnt/nfs/home/ndc104/sire.app/bin/conda config --prepend channels conda-forge'
Warning: 'conda-forge' already in 'channels' list, moving to the top
Installing packages using: '/mnt/nfs/home/ndc104/sire.app/bin/conda install --yes ipython pytest nose netcdf4=1.5.3 boost=1.72.0 gsl=2.6 tbb=2019.9 tbb-devel=2019.9 pyqt=5.12.3 gcc_linux-64 gxx_linux-64 make libtool autoconf automake cmake'
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.

PackagesNotFoundError: The following packages are not available from current channels:

  • pyqt=5.12.3
  • netcdf4=1.5.3
  • gcc_linux-64
  • gxx_linux-64

@lohedges
Copy link
Member

lohedges commented Aug 7, 2020

Interesting, thanks for the update. As I said, I've not installed on ppc64 myself. Perhaps @ptosco could comment, since he submitted the pull request for ppc64le support. It doesn't look like any conda dependencies were updated in the build script, so perhaps it's a case of manually installing the missing packages from source before building. It looks like netcdf4 is available for ppc64le if you use version 1.4.2 instead. (Versions of conda dependencies are pinned in the build/build_sire.py script.)

@djcole56
Copy link
Author

djcole56 commented Aug 7, 2020

Yes, agreed that it's probably a case of installing these manually. I'll see what I can do with system admin support, and let you know either way.

@ptosco
Copy link
Contributor

ptosco commented Aug 7, 2020

@djcole56 Hi Danny, correct, those packages are not available through conda.

  • Qt5: the way I fixed the problem in my case was to use the CentOS 7 ppc64le version of Qt5, as my ppc64le HPC system was running on CentOS 7. I am sure you can find similar pre-built ppc64le packages for other Linux distributions. Please note that you don't need the Python wrappers - the C++ libraries will be sufficient as Sire does not use PyQt.
    As I had no root privileges I simply downloaded the Qt5 RPMs from centos.org and then unpacked them with rpm2cpio <my.rpm> | cpio -idm, and set CMake paths accordingly to point at the include and lib64 dirs.
  • netcdf4 this was available as a pre-built Lmod module on my HPC system; if it is not available on your you may easily build it from source, or use a pre-built package from your distro.
  • gcc and g++ were available as Lmod modules on my HPC system, otherwise you y get them from your Linux distro.
    Feel free to get back to me if you have issues - I am confident that Sire 2020 will build also for you!

@djcole56
Copy link
Author

djcole56 commented Aug 7, 2020

Hi @ptosco, thanks very much for your earlier work and new advice. We had actually already installed Qt5 on the HPC, so I was confused that PyQt was missing. But if not needed, then it looks like we can ignore it. And I've enquired about the availability of the remaining modules. I'm confident we're nearly there!

@lohedges
Copy link
Member

Hi @djcole56, I was just wondering if there was any update on this? Did you manage to build Sire in the end?

@djcole56
Copy link
Author

Hi @lohedges, still making progress thanks. We've managed to use gcc and g++ from existing modules on the HPC, and just trying to get netcdf4 built on the same system. I don't see any further hurdles from the Sire side, so feel free to close this issue if you like, and I'll open a new one if I get stuck again. Thanks!

@bieniekmateusz
Copy link

Hi. We just installed it and it looks like there is still a small issue with the CPUID. It checks for Power9:

Sire/corelib/CMakeLists.txt

Lines 950 to 961 in a9f32a6

if (NOT ${SIRE_FOUND_CPUID})
if (NOT ${CMAKE_HOST_SYSTEM_PROCESSOR} STREQUAL "ppc64le")
find_library( CPUID_LIBRARY "cpuid" PATHS ${CPUID_LIBRARY_DIR} )
include_directories (${CPUID_INCLUDE_DIR} )
set(SIRE_FOUND_CPUID TRUE)
if (HAVE_STDINT_H)
set( CPUID_DEFINITIONS "-DHAVE_STDINT_H" )
endif()
else()
message( STATUS "Cannot find libcpuid. Will disable CPU detection code." )
endif()
endif()

but only if SIRE_FOUND_CPUID is False. However, at that point it is True because cpuid is being bundled:
set( SIRE_FOUND_CPUID TRUE )

Can the bundling be omitted completely on Power9? Thanks.

@lohedges
Copy link
Member

lohedges commented May 27, 2021 via email

@lohedges
Copy link
Member

I've just pushed a fix, which I've tested locally by checking if CMAKE_HOST_SYSTEM_PROCESSOR is equal to x86_64, rather than ppc64le. Note that you'll need to clear your CMake cache if you are pulling the update and rebuilding in the same directory. It's probably easiest to simply remove the build/corelib directory and re-run ./compile_sire.sh.

Let me know if you run into any other issues.

@bieniekmateusz
Copy link

Thanks, I confirm that the fix removed the problem with libcpuid on Power9.

We found the other issue we were struggling with. It's to do with the ABI compatibility. Specifically, the OpenMM (7.4.2) that we have access to and that we compiled uses ABI with CXX11.

Specifically, we use conda install -c omnia-dev/label/cuda101 openmm which was compiled with GCC 8.2 and I believe used CXX11 ABI. The check I used for this is nm ./lib/libOpenMM.so | grep -i CXX11

In order to remove our linking issue I simply removed the compatibility ABI flat -D_GLIBCXX_USE_CXX11_ABI=0:

    # Now gcc 5 specific options
    if ( GCC_MAJOR_VERSION GREATER 4 )
      if (MSYS)
        message(STATUS "MSYS2 will use builtin OpenMM if available...")
      else()
        # OpenMM with conda uses the old C++ binary API!
        # Tell GCC 5 to respect the old API
        set( SIRE_PLATFORM_FLAGS "${SIRE_PLATFORM_FLAGS} -D_GLIBCXX_USE_CXX11_ABI=0" )
      endif()
    endif()

The quick minimisation/tests with somd-freenrg appear to be running fine now.

I do not see CXX11 in the openmm installed on x86_64 machine

@lohedges
Copy link
Member

lohedges commented Jun 7, 2021

Hmmm, interesting. I didn't add that compiler flag, but was under the impression that the Omnia package used the old ABI, whereas the new conda-forge package uses the new ABI. As you say, there's no mention of CX11 when running nm on the Linux so, so perhaps this fix is now redundant for the Omnia build. I'll try removing it and rebuilding when I get a chance. (Perhaps older versions of OpenMM did require this fix.)

@bieniekmateusz
Copy link

I've just downloaded the 7.4.2 python 3.7 from omnia as the build_sire.py does and nm shows no cxx11 (https://anaconda.org/omnia/openmm/files). So that makes sense that you correct for it.

However, in the version from the omnia-dev 7.4.0 I have the cxx11 is present. That is the openmm-7.4.0-py37_cuda101_1.tar
(https://anaconda.org/omnia-dev/openmm/files?version=7.4.0).

Saying that, all conda-force appears to have cxx11. The new release for ppc64le, py39 (https://twitter.com/openmm_toolkit/status/1400859263157874695) has a lot of cxx11. Similarly for linux-64 I also find cxx11 in the binaries.

So it seems it is more about our binaries as well as conda-force.

Thanks, Mat

@lohedges
Copy link
Member

lohedges commented Jun 7, 2021

Yes, we patch for the conda-forge build, so could do the same for ppc64le if needed.

@bieniekmateusz
Copy link

In that case I think it's best to ignore it then. Cheers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants