-
Notifications
You must be signed in to change notification settings - Fork 96
Segfault during drop #159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for reporting - wonder if you could come up with a minimal example that crashes? (looks like a double free, perhaps?) |
What version of |
@aldanor Yes I will try to provide a minimal example. A first guess would be due to calling rust via pyo3 wrapper (and only in release mode). @mulimoen Do you mean libhdf5? $ dnf info hdf5-devel
Last metadata expiration check: 0:00:20 ago on Thu 10 Jun 2021 10:18:49 PM CEST.
Installed Packages
Name : hdf5-devel
Version : 1.10.6
Release : 5.fc34 |
I think this bug is most likely due to PyO3 not cleaning up correctly. The current process hierarchy is: So I guess, PyO3 drops the Rust wrapper, but somehow the rayon thread pool lives on (maybe also the submission thread which holds the threadpool). This may result in a cleanup of the hdf5 handles while the thread pool still tries to access them. Stopping the Submission thread and thus also the rayon thread pool explicitly in python via |
Is this a |
Sorry for the late reply. My data structure is as follows: struct Dataset {
hdf5_handles: Vec<hdf5::File>,
}
struct DataLoader {
ds: Arc<Dataset>,
}
impl DataLoader {
// ...
fn start_worker(&mut self) -> std::thread::JoinHandle<()> {
let ds = Arc::clone(self.ds);
let thread_handle = std::thread::spawn(move || {
0..10.par_bridge().for_each( || { // par_bridge (rayon) iterates in parallel over range and results in parallel hdf5 access
let data = ds.read_from_hdf5();
// Send via some channel; closing the channel will stop this parallel loop
});
});
thread_handle
}
fn drop(&mut self) {
// Close the data output channel so that the rayon parallel iter stops
// And join the worker thread afterwards
}
} When calling via pyo3, I think |
This is an ugly issue, I can replicate with (no python): use hdf5::*;
fn main() {
let file = File::create("file.h5").unwrap();
let ds = file.new_dataset::<i32>().create("ds").unwrap();
let h = std::thread::spawn(move || {
std::mem::drop(ds);
});
// std::mem::drop(file);
// h.join().unwrap();
} I initially thought this was related to how we handle objects (see #139), but this made no difference. The fault happens when To fix this we can use H5dont_atexit on library initialisation. This usually guards against unclosed objects, but in |
I got a segfault on program exit when using hdf5 in a multithread (rayon) environment.
Coredump:
Here is the full backtrace:
Click to expand
The text was updated successfully, but these errors were encountered: