Fix nan fill value metadata comparison #2930

TomNicholas · 2025-03-24T19:42:19Z

Fix for #2929

TODO:

Add unit tests and/or doctests in docstrings
~~Add docstrings and API docs for any new/modified user-facing classes and functions~~
New/modified features documented in docs/user-guide/*.rst
Changes documented as a new file in changes/
GitHub Actions have all passed
Test coverage is 100% (Codecov passes)

d-v-b · 2025-03-24T19:46:10Z

Thanks for this! When we add datetimes, we will need to change this code again with a separate block for np.nat.

TomNicholas · 2025-03-24T19:46:28Z

src/zarr/abc/metadata.py

@@ -44,3 +46,23 @@ def from_dict(cls, data: dict[str, JSON]) -> Self:
        """

        return cls(**data)
+
+    def __eq__(self, other: Any) -> bool:


Overriding this means we lose the nice auto-generated dataclass __eq__ method, which prints out like this:

Matching attributes: ['shape', 'data_type', 'chunk_grid', 'chunk_key_encoding', 'codecs', 'attributes', 'dimension_names', 'zarr_format', 'node_type', 'storage_transformers'] Differing attributes: ['fill_value'] Drill down into differing attribute fill_value: fill_value: np.float32(nan) != np.float32(nan)

But I don't see an easy way to keep that formatting and make it handle fill_value properly too...

TomNicholas · 2025-03-24T19:46:55Z

src/zarr/core/metadata/v3.py

@@ -233,7 +233,7 @@ class ArrayV3MetadataDict(TypedDict):
    attributes: dict[str, JSON]


-@dataclass(frozen=True, kw_only=True)
+@dataclass(frozen=True, kw_only=True, eq=False)


Should do this for ArrayV2Metadata too.

TomNicholas · 2025-03-24T19:47:35Z

tests/test_metadata/test_v3.py

@@ -411,3 +411,20 @@ def test_dtypes(dtype_str: str) -> None:
    else:
        # return type for vlen types may vary depending on numpy version
        assert dt.byte_count is None
+
+
+def test_metadata_comparison_with_nan_fill_value():


This test could presumably be parametrized over both v2 and v3 metadata, but you don't appear to have any other tests which do that. Is there a reason for that?

the two metadata documents are separate entities, so they get tested separately. if there's shared functionality that they both depend on, but isn't part of their shared base class, then that shared functionality should have its own test.

higher up in the stack we test the array classes against the two metadata varieties, because part of the job of the array classes is to abstract over the v2 v3 differences.

d-v-b · 2025-03-24T19:48:29Z

more broadly, np.isnan is picky about the types it accepts: np.isnan('im not nan') will raise a type error, for example. So I think some try... excepts might be necessary here.

d-v-b · 2025-03-24T19:54:34Z

just a thought but what if the fill_value was already JSON-encoded, i.e. instead of fill_value: <numpy scalar> we did fill_value: JSON-able thing. This would totally prevent the need to handle edge cases in __eq__

TomNicholas · 2025-03-24T20:10:12Z

just a thought but what if the fill_value was already JSON-encoded

This would totally prevent the need to handle edge cases in eq

I like this idea - it seems what you actually care about is whether the representation of the metadata on-disk is equal, which is what the JSON-encoded version is. Would this involve changing Metadata's internal representation of the fill_value?

d-v-b · 2025-03-24T20:17:42Z

just a thought but what if the fill_value was already JSON-encoded
This would totally prevent the need to handle edge cases in eq

I like this idea - it seems what you actually care about is whether the representation of the metadata on-disk is equal, which is what the JSON-encoded version is. Would this involve changing Metadata's internal representation of the fill_value?

yes, it would require pushing the "treat this JSON value as a numpy scalar" job higher up in the stack, or we could add a method on the metadata classes.

FWIW, over in #2874 I'm giving each data type object a method that converts the JSON representation of a scalar to an in-memory representation. That's exactly what we would use here.

TomNicholas · 2025-03-24T20:23:09Z

FWIW, over in #2874 I'm giving each data type object a method that converts the JSON representation of a scalar to an in-memory representation. That's exactly what we would use here.

Okay, I'll wait until that's in before returning to this then. I now have a stopgap fix in VirtualiZarr already anyway (zarr-developers/VirtualiZarr#502).

TomNicholas added 2 commits March 24, 2025 15:34

add test

19675e4

override __eq__ method

affa01c

github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Mar 24, 2025

TomNicholas commented Mar 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix nan fill value metadata comparison #2930

Fix nan fill value metadata comparison #2930

Uh oh!

TomNicholas commented Mar 24, 2025

Uh oh!

d-v-b commented Mar 24, 2025

Uh oh!

TomNicholas Mar 24, 2025

Uh oh!

TomNicholas Mar 24, 2025

Uh oh!

TomNicholas Mar 24, 2025

Uh oh!

d-v-b Mar 24, 2025

Uh oh!

d-v-b commented Mar 24, 2025 •

edited

Loading

Uh oh!

d-v-b commented Mar 24, 2025 •

edited

Loading

Uh oh!

TomNicholas commented Mar 24, 2025

Uh oh!

d-v-b commented Mar 24, 2025

Uh oh!

TomNicholas commented Mar 24, 2025

Uh oh!

Uh oh!

Uh oh!

Fix nan fill value metadata comparison #2930

Are you sure you want to change the base?

Fix nan fill value metadata comparison #2930

Uh oh!

Conversation

TomNicholas commented Mar 24, 2025

Uh oh!

d-v-b commented Mar 24, 2025

Uh oh!

TomNicholas Mar 24, 2025

Choose a reason for hiding this comment

Uh oh!

TomNicholas Mar 24, 2025

Choose a reason for hiding this comment

Uh oh!

TomNicholas Mar 24, 2025

Choose a reason for hiding this comment

Uh oh!

d-v-b Mar 24, 2025

Choose a reason for hiding this comment

Uh oh!

d-v-b commented Mar 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

d-v-b commented Mar 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TomNicholas commented Mar 24, 2025

Uh oh!

d-v-b commented Mar 24, 2025

Uh oh!

TomNicholas commented Mar 24, 2025

Uh oh!

Uh oh!

d-v-b commented Mar 24, 2025 •

edited

Loading

d-v-b commented Mar 24, 2025 •

edited

Loading