Open
Description
I am looking at OpenMPI 4.0.3 and HDF5 1.10.6 compiled against it. A user reported segfault in ADIOI_Flatten()
when using a chunked dataset, i.e., when the following line is executed:
CALL h5pset_chunk_f(crp_list, 1, dims, ierr)
A simple FORTRAN reproducer is attached (compile with h5pfc ioerror.F90
, run with mpirun -np2 ./a.out
). The same code works with Intel MPI. Here is the stack:
$ mpirun -np 2 ./a.out
myid, numprocs: 0 2
[b2368:169069:0:169069] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
==== backtrace (tid: 169069) ====
0 0x0000000000050ba5 ucs_debug_print_backtrace() /build-result/src/hpcx-v2.6.0-gcc-MLNX_OFED_LINUX-4.7-1.0.0.1-redhat7.7-x86_64/ucx-v1.8.x/src/ucs/debug/debug.c:625
1 0x0000000000034278 ADIOI_Flatten() /cluster/work/users/vegarde/build/OpenMPI/4.0.3/GCC-9.3.0/openmpi-4.0.3/ompi/mca/io/romio321/romio/adio/common/flatten.c:322
2 0x0000000000035a6c ADIOI_Flatten_datatype() /cluster/work/users/vegarde/build/OpenMPI/4.0.3/GCC-9.3.0/openmpi-4.0.3/ompi/mca/io/romio321/romio/adio/common/flatten.c:166
3 0x000000000002c2d5 ADIO_Set_view() /cluster/work/users/vegarde/build/OpenMPI/4.0.3/GCC-9.3.0/openmpi-4.0.3/ompi/mca/io/romio321/romio/adio/common/ad_set_view.c:52
4 0x0000000000013f26 mca_io_romio_dist_MPI_File_set_view() /cluster/work/users/vegarde/build/OpenMPI/4.0.3/GCC-9.3.0/openmpi-4.0.3/ompi/mca/io/romio321/romio/mpi-io/set_view.c:157
5 0x000000000000cfb7 mca_io_romio321_file_set_view() /cluster/work/users/vegarde/build/OpenMPI/4.0.3/GCC-9.3.0/openmpi-4.0.3/ompi/mca/io/romio321/src/io_romio321_file_open.c:237
6 0x000000000007246e PMPI_File_set_view() /cluster/work/users/vegarde/build/OpenMPI/4.0.3/GCC-9.3.0/openmpi-4.0.3/ompi/mpi/c/profile/pfile_set_view.c:80
7 0x00000000002d34d4 H5FD_mpio_write() H5FDmpio.c:0
8 0x000000000011b5fe H5FD_write() ???:0
9 0x00000000000fab93 H5F__accum_write() ???:0
10 0x00000000001f3d7b H5PB_write() ???:0
11 0x00000000001050fb H5F_block_write() ???:0
12 0x00000000000b5658 H5D__chunk_allocate() ???:0
13 0x00000000000c4117 H5D__init_storage() H5Dint.c:0
14 0x00000000000c964b H5D__alloc_storage() ???:0
15 0x00000000000d02f5 H5D__layout_oh_create() ???:0
16 0x00000000000c539c H5D__create() ???:0
17 0x00000000000d101a H5O__dset_create() H5Doh.c:0
18 0x00000000001abd53 H5O_obj_create() ???:0
19 0x00000000001768f7 H5L__link_cb() H5L.c:0
20 0x000000000014c6e3 H5G__traverse_real.isra.0() H5Gtraverse.c:0
21 0x000000000014cb86 H5G_traverse() ???:0
22 0x000000000017431e H5L__create_real.part.0() H5L.c:0
23 0x0000000000177c36 H5L_link_object() ???:0
24 0x00000000000c4c6f H5D__create_named() ???:0
25 0x00000000000a2991 H5Dcreate2() ???:0
26 0x0000000000034b60 h5dcreate_c() ???:0
27 0x000000000002afba __h5d_MOD_h5dcreate_f() ???:0
28 0x000000000040181d MAIN__() /cluster/home/marcink/ioerror.F90:97
29 0x00000000004019ac main() /cluster/home/marcink/ioerror.F90:3
30 0x0000000000022545 __libc_start_main() ???:0
31 0x00000000004013c9 _start() ???:0
=================================
Program received signal SIGSEGV: Segmentation fault - invalid memory reference.
Backtrace for this error:
#0 0x7fa8569a062f in ???
#1 0x7fa833c3d278 in ADIOI_Flatten
at adio/common/flatten.c:321
#2 0x7fa833c3ea6b in ADIOI_Flatten_datatype
at adio/common/flatten.c:166
#3 0x7fa833c352d4 in ADIO_Set_view
at adio/common/ad_set_view.c:52
#4 0x7fa833c1cf25 in mca_io_romio_dist_MPI_File_set_view
at mpi-io/set_view.c:157
#5 0x7fa833c15fb6 in mca_io_romio321_file_set_view
at src/io_romio321_file_open.c:237
#6 0x7fa85778546d in PMPI_File_set_view
at /cluster/work/users/vegarde/build/OpenMPI/4.0.3/GCC-9.3.0/openmpi-4.0.3/ompi/mpi/c/profile/pfile_set_view.c:80
[...]
Could that be an OpenMPI problem, or do you think it is HDF5 that's causing it?
I'd appreciate any help! thanks!