Skip to content

Fix nan fill value metadata comparison #2930

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 23 additions & 1 deletion src/zarr/abc/metadata.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
from __future__ import annotations

from collections.abc import Sequence
from typing import TYPE_CHECKING
from typing import TYPE_CHECKING, Any

if TYPE_CHECKING:
from typing import Self
Expand All @@ -10,6 +10,8 @@

from dataclasses import dataclass, fields

import numpy as np

__all__ = ["Metadata"]


Expand Down Expand Up @@ -44,3 +46,23 @@
"""

return cls(**data)

def __eq__(self, other: Any) -> bool:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overriding this means we lose the nice auto-generated dataclass __eq__ method, which prints out like this:

Matching attributes:
['shape',
 'data_type',
 'chunk_grid',
 'chunk_key_encoding',
 'codecs',
 'attributes',
 'dimension_names',
 'zarr_format',
 'node_type',
 'storage_transformers']
Differing attributes:
['fill_value']

Drill down into differing attribute fill_value:
  fill_value: np.float32(nan) != np.float32(nan)

But I don't see an easy way to keep that formatting and make it handle fill_value properly too...

"""Checks metadata are identical, including special treatment for NaN fill_values."""
if not isinstance(other, type(self)):
return False

Check warning on line 53 in src/zarr/abc/metadata.py

View check run for this annotation

Codecov / codecov/patch

src/zarr/abc/metadata.py#L53

Added line #L53 was not covered by tests

metadata_dict1 = self.to_dict()
metadata_dict2 = other.to_dict()

# fill_value is a special case because numpy NaNs cannot be compared using __eq__, see https://stackoverflow.com/a/10059796
fill_value1 = metadata_dict1.pop("fill_value")
fill_value2 = metadata_dict2.pop("fill_value")
if np.isnan(fill_value1) and np.isnan(fill_value2):
fill_values_equal = fill_value1.dtype == fill_value2.dtype
else:
fill_values_equal = fill_value1 == fill_value2

# everything else in ArrayV3Metadata is a string, Enum, or Dataclass
return fill_values_equal and metadata_dict1 == metadata_dict2

2 changes: 1 addition & 1 deletion src/zarr/core/metadata/v3.py
Original file line number Diff line number Diff line change
Expand Up @@ -233,7 +233,7 @@ class ArrayV3MetadataDict(TypedDict):
attributes: dict[str, JSON]


@dataclass(frozen=True, kw_only=True)
@dataclass(frozen=True, kw_only=True, eq=False)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should do this for ArrayV2Metadata too.

class ArrayV3Metadata(Metadata):
shape: ChunkCoords
data_type: DataType
Expand Down
17 changes: 17 additions & 0 deletions tests/test_metadata/test_v3.py
Original file line number Diff line number Diff line change
Expand Up @@ -411,3 +411,20 @@ def test_dtypes(dtype_str: str) -> None:
else:
# return type for vlen types may vary depending on numpy version
assert dt.byte_count is None


def test_metadata_comparison_with_nan_fill_value():
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test could presumably be parametrized over both v2 and v3 metadata, but you don't appear to have any other tests which do that. Is there a reason for that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the two metadata documents are separate entities, so they get tested separately. if there's shared functionality that they both depend on, but isn't part of their shared base class, then that shared functionality should have its own test.

higher up in the stack we test the array classes against the two metadata varieties, because part of the job of the array classes is to abstract over the v2 v3 differences.

# regression test for https://github.com/zarr-developers/zarr-python/issues/2929
metadata_dict = {
"zarr_format": 3,
"node_type": "array",
"shape": (1,),
"chunk_grid": {"name": "regular", "configuration": {"chunk_shape": (1,)}},
"data_type": np.dtype("float32"),
"chunk_key_encoding": {"name": "default", "separator": "."},
"codecs": ({'name': 'bytes', 'configuration': {'endian': 'little'}},),
"fill_value": np.float32("nan"),
}
metadata1 = ArrayV3Metadata.from_dict(metadata_dict)
metadata2 = ArrayV3Metadata.from_dict(metadata_dict)
assert metadata1 == metadata2