You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have output->data.remote_dst_datatype == NULL, which is not equal to PARSEC_DATATYPE_NULL (MPI_DATATYPE_NULL), so we go on and call the MPI_GET_NAME and crash MPI.
Two issues here
the dst_datatype should not be NULL? Presumably that flow has a type and we should have retrieved it. Explanation: this comes from GLOBAL_BARRIER Y, which is a CTL, thus with no type. This looks like it is a bug in get_datatype with CTL. the arena_datatypes in GEMM_NN_GPU was not filled.
RESOLUTION: Found issue in DPLASMA, missing dplasma_add2arena for the gpuNN gemm
In gdb
We have
output->data.remote_dst_datatype == NULL, which is not equal toPARSEC_DATATYPE_NULL (MPI_DATATYPE_NULL), so we go on and call the MPI_GET_NAME and crash MPI.Two issues here
Explanation: this comes from GLOBAL_BARRIER Y, which is a CTL, thus with no type. This looks like it is a bug in get_datatype with CTL.the arena_datatypes in GEMM_NN_GPU was not filled.Not immediately clear why/if this is related to the PR, or we just fixed the other issue that was masking this one.
Originally posted by @abouteiller in ICLDisco/parsec#733 (comment)