Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

COSMA only using 1 CPU core #147

Open
m13253 opened this issue Dec 2, 2024 · 0 comments
Open

COSMA only using 1 CPU core #147

m13253 opened this issue Dec 2, 2024 · 0 comments

Comments

@m13253
Copy link

m13253 commented Dec 2, 2024

I am trying to run COSMA on the TACC Stampede3 Skylake cluster, but noticed extremely slow performance due to COSMA using only one CPU core per process.

Build configuration:

$ mkdir cosma/build && cd cosma/build
$ cmake -G Ninja -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx -DCMAKE_Fortran_COMPILER=ifx -DCOSMA_BLAS=MKL -DCOSMA_SCALAPACK=MKL -DCMAKE_INSTALL_PREFIX=$HOME/work/cosma-install ..
$ ninja && ninja install

SLURM job generator (that generates many SLURM jobs):

#!/bin/bash

for i in {24..64..8}
do
    cat >cosma-$i.sh <<EOF
#!/bin/bash
#SBATCH --exclusive
#SBATCH --job-name=cosma-${i}nodes
#SBATCH --nodes=$i
#SBATCH --ntasks=$((2*$i))
#SBATCH --ntasks-per-socket=1
#SBATCH --cpus-per-task=24
#SBATCH --time=60

export MKL_NUM_THREADS=24 OMP_NUM_THREADS=24 OMP_PROC_BIND=true
echo 'Number of MPI processes: $((2*$i))'
mpiexec --np $(($i*2)) "\$HOME/work/cosma-install/bin/cosma_miniapp" -m 1000000 -n 64 -k 1000000 -t float
echo 'Number of MPI processes: $((2*$i))'
EOF
    chmod 0755 cosma-$i.sh
done

When I executed these generated jobs, all of them timed out after a whole hour.
I logged into those allocated machines and noticed only 1 core is used by each process (4% utilization).

Is there any misconfiguration causing COSMA unable to use all CPU cores?
Please let me know if you need more information.


Environment information:

CPU model:          Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz
Thread(s) per core: 1
Core(s) per socket: 24
Socket(s):          2
Total RAM:          187 GiB (2 NUMA nodes)
Kernel version:     Linux 5.14.0-362.24.1.el9_3.0.1.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Apr 4 22:31:43 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Configuration log:

-- The CXX compiler identification is IntelLLVM 2024.0.0
-- The C compiler identification is IntelLLVM 2024.0.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /opt/intel/oneapi/compiler/2024.0/bin/icpx - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /opt/intel/oneapi/compiler/2024.0/bin/icx - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Setting build type to 'Release' as none was specified.
-- Selected SCALAPACK backend for COSMA: MKL
-- The Fortran compiler identification is IntelLLVM 2024.0.0
-- Detecting Fortran compiler ABI info
-- Detecting Fortran compiler ABI info - done
-- Check for working Fortran compiler: /opt/intel/oneapi/compiler/2024.0/bin/ifx - skipped
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Found MPI_C: /opt/intel/oneapi/mpi/2021.11/lib/libmpifort.so (found version "3.1")
-- Found MPI_CXX: /opt/intel/oneapi/mpi/2021.11/lib/libmpicxx.so (found version "3.1")
-- Found MPI_Fortran: /opt/intel/oneapi/mpi/2021.11/lib/libmpifort.so (found version "3.1")
-- Found MPI: TRUE (found version "3.1") found components: CXX C Fortran
-- Found OpenMP_CXX: -fiopenmp (found version "5.1")
-- Found OpenMP_C: -fiopenmp (found version "5.1")
-- Found OpenMP_Fortran: -fiopenmp (found version "5.0")
-- Found OpenMP: TRUE (found version "5.1") found components: CXX C Fortran
-- Found MKL: /opt/intel/oneapi/mkl/2024.0/include
-- Found Blas: ;/opt/intel/oneapi/mkl/2024.0/lib/intel64/libmkl_intel_lp64.so;/opt/intel/oneapi/mkl/2024.0/lib/intel64/libmkl_intel_thread.so;/opt/intel/oneapi/mkl/2024.0/lib/intel64/libmkl_core.so;;/opt/intel/oneapi/compiler/2024.0/lib/libiomp5.so;/lib64/libpthread.a;Threads::Threads
-- Found MPI: TRUE (found version "3.1") found components: CXX C
-- Found MPI: TRUE (found version "3.1") found components: CXX C Fortran
-- Found OpenMP_CXX: -fiopenmp (found version "5.1")
-- Found OpenMP_C: -fiopenmp (found version "5.1")
-- Found OpenMP_Fortran: -fiopenmp (found version "5.0")
-- Found SCALAPACK: /opt/intel/oneapi/mkl/2024.0/lib/intel64/libmkl_scalapack_lp64.so;;/opt/intel/oneapi/mkl/2024.0/lib/intel64/libmkl_blacs_intelmpi_lp64.so
-- Selected ScaLAPACK backend (or implementation) for COSTA: MKL
-- Found MPI: TRUE (found version "3.1") found components: CXX
-- Found OpenMP_CXX: -fiopenmp (found version "5.1")
-- Found OpenMP: TRUE (found version "5.1") found components: CXX
-- Found MPI: TRUE (found version "3.1") found components: CXX C Fortran
-- Found OpenMP_CXX: -fiopenmp (found version "5.1")
-- Found OpenMP_C: -fiopenmp (found version "5.1")
-- Found OpenMP_Fortran: -fiopenmp (found version "5.0")
-- Found OpenMP: TRUE (found version "5.1") found components: CXX C Fortran
-- Configuring done (8.7s)
-- Generating done (0.2s)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant