You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to run COSMA on the TACC Stampede3 Skylake cluster, but noticed extremely slow performance due to COSMA using only one CPU core per process.
When I executed these generated jobs, all of them timed out after a whole hour.
I logged into those allocated machines and noticed only 1 core is used by each process (4% utilization).
Is there any misconfiguration causing COSMA unable to use all CPU cores?
Please let me know if you need more information.
Environment information:
CPU model: Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz
Thread(s) per core: 1
Core(s) per socket: 24
Socket(s): 2
Total RAM: 187 GiB (2 NUMA nodes)
Kernel version: Linux 5.14.0-362.24.1.el9_3.0.1.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Apr 4 22:31:43 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Configuration log:
-- The CXX compiler identification is IntelLLVM 2024.0.0
-- The C compiler identification is IntelLLVM 2024.0.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /opt/intel/oneapi/compiler/2024.0/bin/icpx - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /opt/intel/oneapi/compiler/2024.0/bin/icx - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Setting build type to 'Release' as none was specified.
-- Selected SCALAPACK backend for COSMA: MKL
-- The Fortran compiler identification is IntelLLVM 2024.0.0
-- Detecting Fortran compiler ABI info
-- Detecting Fortran compiler ABI info - done
-- Check for working Fortran compiler: /opt/intel/oneapi/compiler/2024.0/bin/ifx - skipped
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Found MPI_C: /opt/intel/oneapi/mpi/2021.11/lib/libmpifort.so (found version "3.1")
-- Found MPI_CXX: /opt/intel/oneapi/mpi/2021.11/lib/libmpicxx.so (found version "3.1")
-- Found MPI_Fortran: /opt/intel/oneapi/mpi/2021.11/lib/libmpifort.so (found version "3.1")
-- Found MPI: TRUE (found version "3.1") found components: CXX C Fortran
-- Found OpenMP_CXX: -fiopenmp (found version "5.1")
-- Found OpenMP_C: -fiopenmp (found version "5.1")
-- Found OpenMP_Fortran: -fiopenmp (found version "5.0")
-- Found OpenMP: TRUE (found version "5.1") found components: CXX C Fortran
-- Found MKL: /opt/intel/oneapi/mkl/2024.0/include
-- Found Blas: ;/opt/intel/oneapi/mkl/2024.0/lib/intel64/libmkl_intel_lp64.so;/opt/intel/oneapi/mkl/2024.0/lib/intel64/libmkl_intel_thread.so;/opt/intel/oneapi/mkl/2024.0/lib/intel64/libmkl_core.so;;/opt/intel/oneapi/compiler/2024.0/lib/libiomp5.so;/lib64/libpthread.a;Threads::Threads
-- Found MPI: TRUE (found version "3.1") found components: CXX C
-- Found MPI: TRUE (found version "3.1") found components: CXX C Fortran
-- Found OpenMP_CXX: -fiopenmp (found version "5.1")
-- Found OpenMP_C: -fiopenmp (found version "5.1")
-- Found OpenMP_Fortran: -fiopenmp (found version "5.0")
-- Found SCALAPACK: /opt/intel/oneapi/mkl/2024.0/lib/intel64/libmkl_scalapack_lp64.so;;/opt/intel/oneapi/mkl/2024.0/lib/intel64/libmkl_blacs_intelmpi_lp64.so
-- Selected ScaLAPACK backend (or implementation) for COSTA: MKL
-- Found MPI: TRUE (found version "3.1") found components: CXX
-- Found OpenMP_CXX: -fiopenmp (found version "5.1")
-- Found OpenMP: TRUE (found version "5.1") found components: CXX
-- Found MPI: TRUE (found version "3.1") found components: CXX C Fortran
-- Found OpenMP_CXX: -fiopenmp (found version "5.1")
-- Found OpenMP_C: -fiopenmp (found version "5.1")
-- Found OpenMP_Fortran: -fiopenmp (found version "5.0")
-- Found OpenMP: TRUE (found version "5.1") found components: CXX C Fortran
-- Configuring done (8.7s)
-- Generating done (0.2s)
The text was updated successfully, but these errors were encountered:
I am trying to run COSMA on the TACC Stampede3 Skylake cluster, but noticed extremely slow performance due to COSMA using only one CPU core per process.
Build configuration:
SLURM job generator (that generates many SLURM jobs):
When I executed these generated jobs, all of them timed out after a whole hour.
I logged into those allocated machines and noticed only 1 core is used by each process (4% utilization).
Is there any misconfiguration causing COSMA unable to use all CPU cores?
Please let me know if you need more information.
Environment information:
Configuration log:
The text was updated successfully, but these errors were encountered: