-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactoring/fixing zarr-pyhton v3 incompatibilities in xarray datatrees #10020
base: main
Are you sure you want to change the base?
Conversation
…n open_dtree for zarr
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
…o dtree-zarrv3
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @aladinor for working on this! I'm happy to see that we are actually quite close to getting this working. I think the changes you made to the tests highlight that we have some work to do around consolidated metadata and possibly error handling in Zarr.
@@ -1751,7 +1749,7 @@ def _get_open_params( | |||
consolidated = False | |||
|
|||
if _zarr_v3(): | |||
missing_exc = ValueError | |||
missing_exc = AssertionError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:( we should be providing a better error in Zarr. Do you have an example traceback that raises this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to provide a better error here too. I realized about it yesterday. Not sure why we handle it like that.
filepath = tmp_path_factory.mktemp("data") / "unaligned_simple_datatree.zarr" | ||
root_data = xr.Dataset({"a": ("y", [6, 7, 8]), "set0": ("x", [9, 10])}) | ||
set1_data = xr.Dataset({"a": 0, "b": 1}) | ||
set2_data = xr.Dataset({"a": ("y", [2, 3]), "b": ("x", [0.1, 0.2])}) | ||
root_data.to_zarr(filepath) | ||
set1_data.to_zarr(filepath, group="/Group1", mode="a") | ||
set2_data.to_zarr(filepath, group="/Group2", mode="a") | ||
set1_data.to_zarr(filepath, group="/Group1/subgroup1", mode="a") | ||
consolidate_metadata(filepath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something seems off here. IIUC, the prior behavior consolidated metadata at the root of the store (filepath
in this case) after each call to to_zarr
. Is that not happening anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jhamman, I am not sure what is happening in this case but if we don't add consolidated_metadata
it won't recognize any other nodes except the root node /
. Therefore, consolidating the Zarr store might deal partially with this issue.
On the other hand, I also remove the nested group set1_data.to_zarr(filepath, group="/Group1/subgroup1", mode="a")
because the behavior is the same. It won't be recognized as a node in both scenarios. I guess this is something we might want to dig further.
Co-authored-by: Joe Hamman <[email protected]>
Another issue I found @jhamman, is that when opening back our datatree if we point to an specific <xarray.DataTree>
Group: /
│ Dimensions: (x: 128, y: 256)
│ Dimensions without coordinates: x, y
│ Data variables:
│ w (x) float64 1kB dask.array<chunksize=(128,), meta=np.ndarray>
│ z (x, y) float64 262kB dask.array<chunksize=(64, 256), meta=np.ndarray>
├── Group: /b
│ Dimensions: (y: 256, x: 128)
│ Dimensions without coordinates: y, x
│ Data variables:
│ B (y, x) float64 262kB dask.array<chunksize=(128, 128), meta=np.ndarray>
├── Group: /a
│ Dimensions: (x: 128, y: 256)
│ Dimensions without coordinates: x, y
│ Data variables:
│ A (x, y) float64 262kB dask.array<chunksize=(64, 256), meta=np.ndarray>
└── Group: /c
│ Dimensions: (x: 128, y: 256)
│ Dimensions without coordinates: x, y
│ Data variables:
│ w (x) float64 1kB dask.array<chunksize=(128,), meta=np.ndarray>
│ z (x, y) float64 262kB dask.array<chunksize=(64, 256), meta=np.ndarray>
└── Group: /c/d
Dimensions: (x: 128, y: 256)
Dimensions without coordinates: x, y
Data variables:
G (x, y) float64 262kB dask.array<chunksize=(64, 256), meta=np.ndarray> It was saved using Zarr v3. when opening it back using dt_round_zarr = xr.open_datatree(
"testv3_dt.zarr",
consolidated=True,
chunks={},
group="/c"
) it will raise the following error Traceback (most recent call last):
File "/media/alfonso/drive/Alfonso/python/xarray/refactoring.py", line 129, in <module>
main()
File "/media/alfonso/drive/Alfonso/python/xarray/refactoring.py", line 111, in main
dt_round_zarr = xr.open_datatree(
^^^^^^^^^^^^^^^^^
File "/media/alfonso/drive/Alfonso/python/xarray/xarray/backends/api.py", line 1130, in open_datatree
backend_tree = backend.open_datatree(
^^^^^^^^^^^^^^^^^^^^^^
File "/media/alfonso/drive/Alfonso/python/xarray/xarray/backends/zarr.py", line 1628, in open_datatree
groups_dict = self.open_groups_as_dict(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/media/alfonso/drive/Alfonso/python/xarray/xarray/backends/zarr.py", line 1677, in open_groups_as_dict
stores = ZarrStore.open_store(
^^^^^^^^^^^^^^^^^^^^^
File "/media/alfonso/drive/Alfonso/python/xarray/xarray/backends/zarr.py", line 645, in open_store
) = _get_open_params(
^^^^^^^^^^^^^^^^^
File "/media/alfonso/drive/Alfonso/python/xarray/xarray/backends/zarr.py", line 1798, in _get_open_params
zarr_group = zarr.open_consolidated(store, **open_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/alfonso/mambaforge/envs/xarray-tests/lib/python3.12/site-packages/zarr/api/synchronous.py", line 212, in open_consolidated
sync(async_api.open_consolidated(*args, use_consolidated=use_consolidated, **kwargs))
File "/home/alfonso/mambaforge/envs/xarray-tests/lib/python3.12/site-packages/zarr/core/sync.py", line 142, in sync
raise return_result
File "/home/alfonso/mambaforge/envs/xarray-tests/lib/python3.12/site-packages/zarr/core/sync.py", line 98, in _runner
return await coro
^^^^^^^^^^
File "/home/alfonso/mambaforge/envs/xarray-tests/lib/python3.12/site-packages/zarr/api/asynchronous.py", line 346, in open_consolidated
return await open_group(*args, use_consolidated=use_consolidated, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/alfonso/mambaforge/envs/xarray-tests/lib/python3.12/site-packages/zarr/api/asynchronous.py", line 807, in open_group
return await AsyncGroup.open(
^^^^^^^^^^^^^^^^^^^^^^
File "/home/alfonso/mambaforge/envs/xarray-tests/lib/python3.12/site-packages/zarr/core/group.py", line 553, in open
return cls._from_bytes_v3(
^^^^^^^^^^^^^^^^^^^
File "/home/alfonso/mambaforge/envs/xarray-tests/lib/python3.12/site-packages/zarr/core/group.py", line 611, in _from_bytes_v3
raise ValueError(msg)
ValueError: Consolidated metadata requested with 'use_consolidated=True' but not found in 'c'. I think this might has to do with consolidated metadata |
whats-new.rst