-
Notifications
You must be signed in to change notification settings - Fork 11
Add check for empty datasets in NWB containers #584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
|
|
@@ -42,6 +42,39 @@ def check_large_dataset_compression( | |||||||
| return None | ||||||||
|
|
||||||||
|
|
||||||||
| @register_check(importance=Importance.CRITICAL, neurodata_type=NWBContainer) | ||||||||
| def check_dataset_not_empty(nwb_container: NWBContainer) -> Optional[Iterable[InspectorMessage]]: | ||||||||
| """ | ||||||||
| Check if any datasets in the container are empty (have zero elements). | ||||||||
|
|
||||||||
| Empty datasets can cause issues with analysis and visualization tools, and generally indicate | ||||||||
| missing or incomplete data. | ||||||||
|
|
||||||||
| Parameters | ||||||||
| ---------- | ||||||||
| nwb_container: NWBContainer | ||||||||
| The NWB container to check for empty datasets. | ||||||||
|
|
||||||||
| Returns | ||||||||
| ------- | ||||||||
| Optional[Iterable[InspectorMessage]] | ||||||||
| Inspector messages for each empty dataset found, or None if no empty datasets are found. | ||||||||
| """ | ||||||||
| for field_name, field in getattr(nwb_container, "fields", dict()).items(): | ||||||||
| if not isinstance(field, (h5py.Dataset, zarr.Array)): | ||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we may also want to check for other empty array-like objects here, including numpy arrays and DataIO objects. Could use something like:
Suggested change
|
||||||||
| continue | ||||||||
|
|
||||||||
| # Check if the dataset has zero elements | ||||||||
| if field.size == 0: | ||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not sure if it will be necessary but for the DataIO objects you may need to use |
||||||||
| yield InspectorMessage( | ||||||||
| severity=Severity.HIGH, | ||||||||
| message=f"The dataset '{os.path.split(field.name)[1]}' is empty (has zero elements). " | ||||||||
| f"Datasets should contain data.", | ||||||||
| ) | ||||||||
|
|
||||||||
| return None | ||||||||
|
|
||||||||
|
|
||||||||
| @register_check(importance=Importance.BEST_PRACTICE_SUGGESTION, neurodata_type=NWBContainer) | ||||||||
| def check_small_dataset_compression( | ||||||||
| nwb_container: NWBContainer, | ||||||||
|
|
||||||||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure yet whether this should be a critical check or a best practice violation. There was a discussion here about whether empty datasets should be allowed: NeurodataWithoutBorders/pynwb#2065.