-
-
Notifications
You must be signed in to change notification settings - Fork 329
(fix): structured dtype fill value consolidated metadata #3015
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
d-v-b
merged 6 commits into
zarr-developers:main
from
ilan-gold:ig/fix_structured_dtype_consolidated
Apr 30, 2025
+34
−1
Merged
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
0da74a4
(fix): structured dtype consolidated metadata fill value
ilan-gold 8136b00
(chore): relnote
ilan-gold 8e23bf9
(chore): test
ilan-gold feb367e
(fix): more robust testing
ilan-gold 21b1fdd
Merge branch 'main' into ig/fix_structured_dtype_consolidated
ilan-gold a959e66
Merge branch 'main' into ig/fix_structured_dtype_consolidated
d-v-b File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Fix structured `dtype` fill value serialization for consolidated metadata |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ilan-gold I'm a little confused by this test. If the fill value is a structured dtype scalar, then shouldn't the fill value that appears in metadata be base64 encoded? If so, shouldn't this check fail in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the python representation here is serlialized into the void data type. Wheter that is correct or not is a different story. Looking at it closely, I think this is either (a) expected, and the typing on
ArrayV2Metadata
is wrong or (b) the typing is right, and the behavior is wrong.The type is:
fill_value: int | float | str | bytes | None = 0
But the call to
parse_fill_value
yieldsnumpy
object:zarr-python/src/zarr/core/metadata/v2.py
Lines 297 to 308 in 36a1bac
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(What I'm trying to say is that this dictionary is not the on-disk json, but a parsed version)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you know offhand what zarr-python 2 did here? (I can also check this later)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely not offhand, but just to understand, why does it matter? Is there a backwards compat concern with the in-memory python representation? TBH structured data types are much less essential to us than some of the other people who are raising these concerns from what it sounds like so I'm not super familiar with previous behavior. I don't think many people in our community use them, but they are in our CI and I like contributing so I make these PRs :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm touching a lot of data type representation code over in #2874 and I want to make sense of some of the test failures I'm seeing, and this was one of the tests that I tripped. I think your explanation makes sense (i.e., this is just the in-memory representation, and so the fill value should be the decoded version), sorry for the noise!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries @d-v-b always happy to help. Will be quick to report on the status of this all once 3.0.8 is out, but our tests show no errors with this feature at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for posterity, the specific thing in my PR that made this test fail was the use of
to_dict
. In #2874, I am making fill value encoding happen in the call toto_dict
, instead of via a special JSON encoder (the status quo). So on my branch this test was comparing the JSON-serialized fill value against the in-memory version. I made the test pass by removing theto_dict
step and directly comparing themetadata.fill_value
attribute against the expectedfill_value
.