-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change default opening / closing behavior of wrapper #14
Comments
close
interface for Dsets, Groups, close by defaultclose
interface for Dsets, Groups, close by default, don't open immediately
Very much related: it's probably not a good idea to open each group and especially dataset, which we encounter, when visiting the whole file. And getting the dataspace immediately is useless, too (I think...). In principle we only need to open the dataset, when we actually access the data in it. For everything else, we just need to keep track of the information of the dataset in the file, e.g. datatype, shape etc. That's why we have an abstract interface in the first place... |
close
interface for Dsets, Groups, close by default, don't open immediately
The file https://github.com/Vindaar/nimhdf5/blob/master/tests/tDebugRamUsage.nim just visits a large file (which currently opens each dataset, group and attribute in it) and waits a few seconds. Running it on a 30GB h5 file it outputs: Visiting file...
objects open:
files open: 1
dsets open: 10472
groups open: 1236
types open: 0
attrs open: 4312 and uses ~300MB purely by visiting the whole file. |
In addition to that, more importantly even, the current way of opening a file "in its entirety" leads to very bad performance for large files! First of all we should replace |
7dccc71 changes the default behavior for attributes. This was the largest cause for slow down on very large H5 files (> 15 GB if many attributes). |
We need a nicer way to close each H5 object individually, without calling the native H5 functions.
In addition implement closing of groups and datasets by default after a read / write etc. procedure. We can introduce some locking flag, which allows us to keep objects open, if the user desires. In some cases that might be useful, if one knows that several successive writes / reads of the same dataset will happen.
The text was updated successfully, but these errors were encountered: