Skip to content

POTRF: Suspicious factorization on Frontier #153

@devreal

Description

@devreal

Describe the bug

srun -N 1 -n 1 --ntasks-per-node=1 --cpus-per-task=56 --gpu-bind=closest,verbose --gpus-per-node=8 ./tests/testing_dpotrf -N $((16*1024)) -NB $((1*1024)) -n 1 -g 1 -x
#+++++ cores detected       : 56
#+++++ nodes x cores + gpu  : 1 x 56 + 1 (56+1)
#+++++ thread mode          : THREAD_SERIALIZED
#+++++ P x Q                : 1 x 1 (1/1)
#+++++ M x N x K|NRHS       : 16384 x 16384 x 1
#+++++ MB x NB              : 1024 x 1024
[****] TIME(s)      0.31887 : dpotrf	PxQxg=   1 1   1 NB= 1024 N=   16384 :    4597.882751 gflops - ENQ&PROG&DEST      0.54877 :    2671.707559 gflops - ENQ      0.22843 - DEST      0.00146
-- Factorization is suspicious ! 
-- Solution is CORRECT ! 

To Reproduce

Checked out DPLASMA master and PaRSEC (whatever is currently linked) and build against HIP.

Environment (please complete the following information):

$ module list

Currently Loaded Modules:
  1) craype-x86-trento                5) Core/25.03          9) DefApps           13) cray-dsmml/0.3.1     17) craype/2.7.35           21) cmake/3.30.5
  2) libfabric/1.22.0                 6) tmux/3.4           10) gcc-native/14.2   14) cray-libsci/25.09.0  18) perftools-base/25.09.0  22) craype-accel-amd-gfx90a
  3) craype-network-ofi               7) hsi/default        11) boost/1.86.0      15) cray-mpich/9.0.1     19) cpe/25.09
  4) xpmem/2.11.3-1.3_gdbda01a1eb3d   8) lfs-wrapper/0.0.1  12) PrgEnv-gnu/8.6.0  16) cray-pmi/6.1.16      20) rocm/6.4.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions