Skip to content

Missing data variables on roundtripped Xarray.DataTree #624

@maxrjones

Description

@maxrjones

It appears data variables are lost when writing and subsequently reading an Xarray.DataTree with Icechunk. I'd be glad to look into this further to see if it relates to upstream issues (e.g., pydata/xarray#9960), but first wanted to check if there's a known solution.

MVCE

import zarr
import icechunk
import xarray as xr

set1_data = xr.Dataset({"a": 0, "b": 1})
set2_data = xr.Dataset({"a": ("x", [2, 3]), "b": ("x", [0.1, 0.2])})
root_data = xr.Dataset({"a": ("y", [6, 7, 8]), "set0": ("x", [9, 10])})

root = xr.DataTree.from_dict(
    {
        "": root_data,
        "set1": set1_data,
        "set1/set1": None,
        "set1/set2": None,
        "set2": set2_data,
        "set2/set1": None,
        "set3": None,
    }
)
storage_config = icechunk.s3_storage(
    bucket="nasa-veda-scratch",
    prefix="icechunk-test/max/xr-datatree-roundtrip",
    region="us-west-2"
)
repo = icechunk.Repository.create(storage_config)
session = repo.writable_session("main")
root.to_zarr(session.store, zarr_format=3, consolidated=False)
session.commit("Commit datatree")
roundtripped = xr.open_datatree(session.store, engine="zarr")
xr.testing.assert_equal(root, roundtripped)
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[4], line 11
      9 session.commit("Commit datatree")
     10 roundtripped = xr.open_datatree(session.store, engine="zarr")
---> 11 xr.testing.assert_equal(root, roundtripped)

    [... skipping hidden 1 frame]

File [/opt/conda/lib/python3.11/site-packages/xarray/testing/assertions.py:138](https://hub.openveda.cloud/opt/conda/lib/python3.11/site-packages/xarray/testing/assertions.py#line=137), in assert_equal(a, b, check_dim_order)
    136     assert a.equals(b), formatting.diff_coords_repr(a, b, "equals")
    137 elif isinstance(a, DataTree):
--> 138     assert a.equals(b), diff_datatree_repr(a, b, "equals")
    139 else:
    140     raise TypeError(f"{type(a)} not supported by assertion comparison")

AssertionError: Left and right DataTree objects are not equal

Data at node 'set1' does not match:
    Data variables only on the left object:
        a        int64 8B 0
        b        int64 8B 1

Data at node 'set2' does not match:
    Differing dimensions:
        (x: 2) != ()
    Data variables only on the left object:
        a        (x) int64 16B 2 3
        b        (x) float64 16B 0.1 0.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions