Skip to content

Conversation

aladinor
Copy link
Contributor

@aladinor aladinor commented Feb 3, 2025

Copy link
Member

@jhamman jhamman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aladinor for working on this! I'm happy to see that we are actually quite close to getting this working. I think the changes you made to the tests highlight that we have some work to do around consolidated metadata and possibly error handling in Zarr.

@TomNicholas TomNicholas added topic-zarr Related to zarr storage library topic-DataTree Related to the implementation of a DataTree class labels Feb 4, 2025
@maxrjones maxrjones mentioned this pull request Feb 6, 2025
2 tasks
@TomNicholas
Copy link
Member

It's working!

@aladinor this was really tricky in the end - you did very well to even get as far as you did!

(@jhamman and I wrote this together so he implicitly approves this PR)

@requires_zarr
@parametrize_zarr_format
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We can also do this, though that fixture is in test_backends.py :

@pytest.mark.usefixtures("default_zarr_format")

Copy link
Member

@TomNicholas TomNicholas Mar 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to consolidate, but is that implementation better in any way? I find it a lot harder to read...

zarr_format: Literal[2, 3],
) -> None:
"""For one zarr array, check that all expected metadata and chunk data files exist."""
# TODO: This function is now so complicated that it's practically checking compliance with the whole zarr spec...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I think the better test here is to write some dask arrays with random data setting compute=False and then assert that you only get fill_value on read. That checks that real data is not written with compute=False.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just punt this to a follow-up PR? All the detailed checks here are so subtle that I'm worried if I try to refactor it again I'm going to either miss things or just delay this PR much further, and ultimately this works fine.

Copy link
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good lets iterate on a future PR.

Thanks @aladinor @TomNicholas @jhamman !

@dcherian dcherian enabled auto-merge (squash) March 20, 2025 00:44
@dcherian dcherian merged commit f0809e4 into pydata:main Mar 20, 2025
30 checks passed
dcherian added a commit to dcherian/xarray that referenced this pull request Mar 20, 2025
* main:
  Refactoring/fixing zarr-python v3 incompatibilities in xarray datatrees (pydata#10020)
  Refactor calendar fixtures (pydata#10150)
  Use flox for grouped first, last. (pydata#10148)
  Update flaky pydap test (pydata#10149)
@TomNicholas TomNicholas deleted the dtree-zarrv3 branch March 20, 2025 13:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-DataTree Related to the implementation of a DataTree class topic-zarr Related to zarr storage library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DataTree roundtrip fails on None group lookup
6 participants