WIP: Fix MPI Communicator destruction (again) #945
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Remember #841? It's back!
A refresher: in CI tests, there was a semi-common (30-50%?) race condition between Parthenon's use of MPI and HDF5's use of MPI when making the last outputs and cleaning up. Parthenon would
MPI_Comm_free
a communicator, and HDF5 would then attempt toMPI_Comm_dup
at the same time, which is apparently an issue for implementations.I found (without explanation nor extensive testing) that I could make that race condition less frequent if I switched the destructor of
Reduction
objects to useMPI_Comm_disconnect
instead ofMPI_Comm_free
. The difference is,MPI_Comm_free
is asynchronous, and sets the current reference tonull
without actually destroying the communicator until every process has done so. It "disconnects" from the communicator, if you will. WhereasMPI_Comm_disconnect
waits until it can confirm that the communicator has been destroyed. That is, it "frees" it (the MPI standard is so straightforward!).In any case, after reorganizing communicators in KHARMA, I've found that when freeing a lot of communicators together,
MPI_Comm_disconnect
is very likely to just... hang forever. This condition happens in, for example, the destructor ofMesh
, which calls the destructor ofPackages_t
which calls every package's destructor, which should in codes which follow our best practices free every communicator at once. So, I imagine it will become a problem for other folks as well, manifesting as a run which finishes, prints statistics, and then just... doesn't return. This will be a problem in batch jobs, where a job that would otherwise return will continue spending core-hours without printing anything suspicious.This PR just reverts to using
MPI_Comm_free
as the destructor, which fixes the issue in KHARMA, and allows all tests to pass on my machine, both without any warnings/double-free/etc. If it fails CI, we can talk about other sorts of hacks for keeping HDF5 out of our way while we destroy theMesh
.Teaching sand to think might have been a mistake, but letting it talk to itself was a catastrophe.