Skip to content

segfault with HDF5 chunked file #7795

Open
@angainor

Description

@angainor

I am looking at OpenMPI 4.0.3 and HDF5 1.10.6 compiled against it. A user reported segfault in ADIOI_Flatten() when using a chunked dataset, i.e., when the following line is executed:

CALL h5pset_chunk_f(crp_list, 1, dims, ierr)

A simple FORTRAN reproducer is attached (compile with h5pfc ioerror.F90, run with mpirun -np2 ./a.out). The same code works with Intel MPI. Here is the stack:

$ mpirun -np 2 ./a.out 
 myid, numprocs:           0           2
[b2368:169069:0:169069] Caught signal 11 (Segmentation fault: address not mapped to object at address (nil))
==== backtrace (tid: 169069) ====
 0 0x0000000000050ba5 ucs_debug_print_backtrace()  /build-result/src/hpcx-v2.6.0-gcc-MLNX_OFED_LINUX-4.7-1.0.0.1-redhat7.7-x86_64/ucx-v1.8.x/src/ucs/debug/debug.c:625
 1 0x0000000000034278 ADIOI_Flatten()  /cluster/work/users/vegarde/build/OpenMPI/4.0.3/GCC-9.3.0/openmpi-4.0.3/ompi/mca/io/romio321/romio/adio/common/flatten.c:322
 2 0x0000000000035a6c ADIOI_Flatten_datatype()  /cluster/work/users/vegarde/build/OpenMPI/4.0.3/GCC-9.3.0/openmpi-4.0.3/ompi/mca/io/romio321/romio/adio/common/flatten.c:166
 3 0x000000000002c2d5 ADIO_Set_view()  /cluster/work/users/vegarde/build/OpenMPI/4.0.3/GCC-9.3.0/openmpi-4.0.3/ompi/mca/io/romio321/romio/adio/common/ad_set_view.c:52
 4 0x0000000000013f26 mca_io_romio_dist_MPI_File_set_view()  /cluster/work/users/vegarde/build/OpenMPI/4.0.3/GCC-9.3.0/openmpi-4.0.3/ompi/mca/io/romio321/romio/mpi-io/set_view.c:157
 5 0x000000000000cfb7 mca_io_romio321_file_set_view()  /cluster/work/users/vegarde/build/OpenMPI/4.0.3/GCC-9.3.0/openmpi-4.0.3/ompi/mca/io/romio321/src/io_romio321_file_open.c:237
 6 0x000000000007246e PMPI_File_set_view()  /cluster/work/users/vegarde/build/OpenMPI/4.0.3/GCC-9.3.0/openmpi-4.0.3/ompi/mpi/c/profile/pfile_set_view.c:80
 7 0x00000000002d34d4 H5FD_mpio_write()  H5FDmpio.c:0
 8 0x000000000011b5fe H5FD_write()  ???:0
 9 0x00000000000fab93 H5F__accum_write()  ???:0
10 0x00000000001f3d7b H5PB_write()  ???:0
11 0x00000000001050fb H5F_block_write()  ???:0
12 0x00000000000b5658 H5D__chunk_allocate()  ???:0
13 0x00000000000c4117 H5D__init_storage()  H5Dint.c:0
14 0x00000000000c964b H5D__alloc_storage()  ???:0
15 0x00000000000d02f5 H5D__layout_oh_create()  ???:0
16 0x00000000000c539c H5D__create()  ???:0
17 0x00000000000d101a H5O__dset_create()  H5Doh.c:0
18 0x00000000001abd53 H5O_obj_create()  ???:0
19 0x00000000001768f7 H5L__link_cb()  H5L.c:0
20 0x000000000014c6e3 H5G__traverse_real.isra.0()  H5Gtraverse.c:0
21 0x000000000014cb86 H5G_traverse()  ???:0
22 0x000000000017431e H5L__create_real.part.0()  H5L.c:0
23 0x0000000000177c36 H5L_link_object()  ???:0
24 0x00000000000c4c6f H5D__create_named()  ???:0
25 0x00000000000a2991 H5Dcreate2()  ???:0
26 0x0000000000034b60 h5dcreate_c()  ???:0
27 0x000000000002afba __h5d_MOD_h5dcreate_f()  ???:0
28 0x000000000040181d MAIN__()  /cluster/home/marcink/ioerror.F90:97
29 0x00000000004019ac main()  /cluster/home/marcink/ioerror.F90:3
30 0x0000000000022545 __libc_start_main()  ???:0
31 0x00000000004013c9 _start()  ???:0
=================================

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x7fa8569a062f in ???
#1  0x7fa833c3d278 in ADIOI_Flatten
	at adio/common/flatten.c:321
#2  0x7fa833c3ea6b in ADIOI_Flatten_datatype
	at adio/common/flatten.c:166
#3  0x7fa833c352d4 in ADIO_Set_view
	at adio/common/ad_set_view.c:52
#4  0x7fa833c1cf25 in mca_io_romio_dist_MPI_File_set_view
	at mpi-io/set_view.c:157
#5  0x7fa833c15fb6 in mca_io_romio321_file_set_view
	at src/io_romio321_file_open.c:237
#6  0x7fa85778546d in PMPI_File_set_view
	at /cluster/work/users/vegarde/build/OpenMPI/4.0.3/GCC-9.3.0/openmpi-4.0.3/ompi/mpi/c/profile/pfile_set_view.c:80
[...]

Could that be an OpenMPI problem, or do you think it is HDF5 that's causing it?
I'd appreciate any help! thanks!

ioerror.zip

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions