Fixed the wrong results bug in the GPU backend. #139

kabicm · 2024-02-07T01:09:11Z

As @simonpintarelli reported, some of the unit tests arising from the RPA simulation were failing with the GPU backend:

 OMP_NUM_THREADS=1 CRAY_CUDA_MPS=1  srun -u -N 1 -n 8 ./miniapp/pxgemm_miniapp -m 43417 -k 2170 -n 217  --test --transpose NN -r 1

Running PDGEMM on the following problem:
=============================
      GLOBAL MAT. SIZES
=============================
A = 43417 x 2170
B = 2170 x 217
C = 43417 x 217
=============================
        SUBMATRICES
=============================
(ia, ja) = (1, 1)
(ib, jb) = (1, 1)
(ic, jc) = (1, 1)
=============================
      SUBMATRIX SIZES
=============================
m = 43417
n = 217
k = 2170
=============================
      ADDITIONAL OPTIONS
=============================
alpha = 1
beta = 0
trans_a = N
trans_b = N
=============================
         PROC GRID
=============================
grid = 1 x 8
grid order = R
=============================
         PROC SRCS
=============================
P_SRC(A) = (0, 0)
P_SRC(B) = (0, 0)
P_SRC(C) = (0, 0)
=============================
          BLOCK SIZES
=============================
Blocks(A) = (128, 128)
Blocks(B) = (128, 128)
Blocks(C) = (128, 128)
=============================
          LEADING DIMS
=============================
lld_a = 43417
lld_b = 2170
lld_c = 43417
=============================

epsilon = 1e-06, v1 = 42.5759, which is != 528.075
epsilon = 1e-06, v1 = 43.1292, which is != 528.41
COSMA TIMES [ms] = 484
SCALAPACK TIMES [ms] = 571
Result is NOT CORRECT!

The bug was only occurring when the GPU backend is used. After a careful analysis, @simonpintarelli and I realized this problem boils down to the following local multiplications, executed multiple times:

m = 5428, n = 217, k = 2170 alpha = 1, beta = 0, copy_c_back = T, tile sizes  = 5000
m = 5427, n = 217, k = 2170 alpha = 1, beta = 0, copy_c_back = T, tile sizes = 5000

This bug was occurring in the GPU backend only when the matrix dimensions were slightly larger than the GPU tile sizes, as described here.

We fixed this bug in the GPU backend in the latest PR.

After updating the Tiled-MM submodule to the latest version, we verified the problem is resolved:

OMP_NUM_THREADS=1 CRAY_CUDA_MPS=1  srun -u -N 1 -n 8 ./miniapp/pxgemm_miniapp -m 43417 -k 2170 -n 217  --test --transpose NN -r 1

Running PDGEMM on the following problem:
=============================
      GLOBAL MAT. SIZES
=============================
A = 43417 x 2170
B = 2170 x 217
C = 43417 x 217
=============================
        SUBMATRICES
=============================
(ia, ja) = (1, 1)
(ib, jb) = (1, 1)
(ic, jc) = (1, 1)
=============================
      SUBMATRIX SIZES
=============================
m = 43417
n = 217
k = 2170
=============================
      ADDITIONAL OPTIONS
=============================
alpha = 1
beta = 0
trans_a = N
trans_b = N
=============================
         PROC GRID
=============================
grid = 1 x 8
grid order = R
=============================
         PROC SRCS
=============================
P_SRC(A) = (0, 0)
P_SRC(B) = (0, 0)
P_SRC(C) = (0, 0)
=============================
          BLOCK SIZES
=============================
Blocks(A) = (128, 128)
Blocks(B) = (128, 128)
Blocks(C) = (128, 128)
=============================
          LEADING DIMS
=============================
lld_a = 43417
lld_b = 2170
lld_c = 43417
=============================

COSMA TIMES [ms] = 304
SCALAPACK TIMES [ms] = 444
Result is CORRECT!

This has been tested on the RTX3090 GPUs.

simonpintarelli · 2024-02-23T22:42:06Z

cscs-ci run P100

simonpintarelli · 2024-02-25T20:32:22Z

cscs-ci run P100

kabicm requested a review from simonpintarelli February 7, 2024 01:09

kabicm self-assigned this Feb 7, 2024

kabicm added bug-fix gpu OpenMPI and removed OpenMPI labels Feb 7, 2024

simonpintarelli approved these changes Feb 23, 2024

View reviewed changes

Fixed the wrong results bug in the GPU backend.

d65e14d

simonpintarelli force-pushed the bugfix branch from fb9fe0c to d65e14d Compare February 23, 2024 18:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed the wrong results bug in the GPU backend. #139

Fixed the wrong results bug in the GPU backend. #139

kabicm commented Feb 7, 2024

simonpintarelli commented Feb 23, 2024 •

edited

Loading

simonpintarelli commented Feb 25, 2024

Fixed the wrong results bug in the GPU backend. #139

Are you sure you want to change the base?

Fixed the wrong results bug in the GPU backend. #139

Conversation

kabicm commented Feb 7, 2024

simonpintarelli commented Feb 23, 2024 • edited Loading

simonpintarelli commented Feb 25, 2024

simonpintarelli commented Feb 23, 2024 •

edited

Loading