Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change default opening / closing behavior of wrapper #14

Open
Vindaar opened this issue Jul 14, 2018 · 4 comments
Open

Change default opening / closing behavior of wrapper #14

Vindaar opened this issue Jul 14, 2018 · 4 comments
Labels
enhancement New feature or request

Comments

@Vindaar
Copy link
Owner

Vindaar commented Jul 14, 2018

We need a nicer way to close each H5 object individually, without calling the native H5 functions.

In addition implement closing of groups and datasets by default after a read / write etc. procedure. We can introduce some locking flag, which allows us to keep objects open, if the user desires. In some cases that might be useful, if one knows that several successive writes / reads of the same dataset will happen.

@Vindaar Vindaar changed the title Implement nicer close interface for Dsets, Groups, close by default Implement nicer close interface for Dsets, Groups, close by default, don't open immediately Jul 14, 2018
@Vindaar
Copy link
Owner Author

Vindaar commented Jul 14, 2018

Very much related: it's probably not a good idea to open each group and especially dataset, which we encounter, when visiting the whole file. And getting the dataspace immediately is useless, too (I think...).

In principle we only need to open the dataset, when we actually access the data in it. For everything else, we just need to keep track of the information of the dataset in the file, e.g. datatype, shape etc. That's why we have an abstract interface in the first place...

@Vindaar Vindaar changed the title Implement nicer close interface for Dsets, Groups, close by default, don't open immediately Change default opening / closing behavior of wrapper Jul 14, 2018
@Vindaar
Copy link
Owner Author

Vindaar commented Jul 14, 2018

The file https://github.com/Vindaar/nimhdf5/blob/master/tests/tDebugRamUsage.nim just visits a large file (which currently opens each dataset, group and attribute in it) and waits a few seconds. Running it on a 30GB h5 file it outputs:

Visiting file...
    objects open:
   	 files open: 1
       	 dsets open: 10472
       	 groups open: 1236
       	 types open: 0
       	 attrs open: 4312

and uses ~300MB purely by visiting the whole file.

@Vindaar Vindaar added the enhancement New feature or request label Aug 28, 2018
@Vindaar
Copy link
Owner Author

Vindaar commented Dec 3, 2018

In addition to that, more importantly even, the current way of opening a file "in its entirety" leads to very bad performance for large files!

First of all we should replace H5Ovisit by H5Lvisit and see if that improves performance when visiting the file. Then, we should rework our getters for groups and datasets as to stop opening each object, as we add it to the tables.

@Vindaar
Copy link
Owner Author

Vindaar commented Jan 12, 2019

7dccc71 changes the default behavior for attributes. This was the largest cause for slow down on very large H5 files (> 15 GB if many attributes).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant