-
-
Notifications
You must be signed in to change notification settings - Fork 324
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix nan fill value metadata comparison #2930
base: main
Are you sure you want to change the base?
Fix nan fill value metadata comparison #2930
Conversation
Thanks for this! When we add datetimes, we will need to change this code again with a separate block for |
@@ -44,3 +46,23 @@ def from_dict(cls, data: dict[str, JSON]) -> Self: | |||
""" | |||
|
|||
return cls(**data) | |||
|
|||
def __eq__(self, other: Any) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overriding this means we lose the nice auto-generated dataclass __eq__
method, which prints out like this:
Matching attributes:
['shape',
'data_type',
'chunk_grid',
'chunk_key_encoding',
'codecs',
'attributes',
'dimension_names',
'zarr_format',
'node_type',
'storage_transformers']
Differing attributes:
['fill_value']
Drill down into differing attribute fill_value:
fill_value: np.float32(nan) != np.float32(nan)
But I don't see an easy way to keep that formatting and make it handle fill_value
properly too...
@@ -233,7 +233,7 @@ class ArrayV3MetadataDict(TypedDict): | |||
attributes: dict[str, JSON] | |||
|
|||
|
|||
@dataclass(frozen=True, kw_only=True) | |||
@dataclass(frozen=True, kw_only=True, eq=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should do this for ArrayV2Metadata
too.
@@ -411,3 +411,20 @@ def test_dtypes(dtype_str: str) -> None: | |||
else: | |||
# return type for vlen types may vary depending on numpy version | |||
assert dt.byte_count is None | |||
|
|||
|
|||
def test_metadata_comparison_with_nan_fill_value(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test could presumably be parametrized over both v2 and v3 metadata, but you don't appear to have any other tests which do that. Is there a reason for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the two metadata documents are separate entities, so they get tested separately. if there's shared functionality that they both depend on, but isn't part of their shared base class, then that shared functionality should have its own test.
higher up in the stack we test the array classes against the two metadata varieties, because part of the job of the array classes is to abstract over the v2 v3 differences.
more broadly, |
just a thought but what if the |
I like this idea - it seems what you actually care about is whether the representation of the metadata on-disk is equal, which is what the JSON-encoded version is. Would this involve changing |
yes, it would require pushing the "treat this JSON value as a numpy scalar" job higher up in the stack, or we could add a method on the metadata classes. FWIW, over in #2874 I'm giving each data type object a method that converts the JSON representation of a scalar to an in-memory representation. That's exactly what we would use here. |
Okay, I'll wait until that's in before returning to this then. I now have a stopgap fix in VirtualiZarr already anyway (zarr-developers/VirtualiZarr#502). |
Fix for #2929
TODO:
Add docstrings and API docs for any new/modified user-facing classes and functionsdocs/user-guide/*.rst
changes/