Skip to content

Conversation

K-Meech
Copy link
Contributor

@K-Meech K-Meech commented Aug 26, 2025

Closes #2582

Prevents creation of arrays / groups under a parent array with .create_array and .create_group e.g. root.create_array(name='foo/bar') (where foo is an existing array)

This required changes to the _save_metadata function of both zarr/core/array.py and zarr/core/group.py. As both used pretty much identical code, I refactored this into a common function in zarr/core/metadata/io.py (along with the _build_parents function both relied upon). Happy to move this elsewhere - if there is a more suitable location for it!

I tried to avoid looping over the parents multiple times in _save_metadata for the sake of performance (potentially this path could be deeply nested). Hence looping once, and creating two sets of awaitables: one for checking if an array exists at the location + one to actually modify the metadata there. Again, happy to update this if there's a simpler solution.

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/user-guide/*.rst
  • Changes documented as a new file in changes/
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Aug 26, 2025
Copy link

codecov bot commented Aug 26, 2025

Codecov Report

❌ Patch coverage is 72.22222% with 15 lines in your changes missing coverage. Please review.
✅ Project coverage is 61.21%. Comparing base (62551c7) to head (c8157fd).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
src/zarr/core/metadata/io.py 74.28% 9 Missing ⚠️
src/zarr/storage/_common.py 73.33% 4 Missing ⚠️
src/zarr/core/array.py 50.00% 1 Missing ⚠️
src/zarr/core/group.py 50.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3407      +/-   ##
==========================================
- Coverage   61.24%   61.21%   -0.04%     
==========================================
  Files          83       84       +1     
  Lines        9907     9923      +16     
==========================================
+ Hits         6068     6074       +6     
- Misses       3839     3849      +10     
Files with missing lines Coverage Δ
src/zarr/core/array.py 68.62% <50.00%> (-0.48%) ⬇️
src/zarr/core/group.py 70.24% <50.00%> (-0.28%) ⬇️
src/zarr/storage/_common.py 67.00% <73.33%> (+0.16%) ⬆️
src/zarr/core/metadata/io.py 74.28% <74.28%> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@d-v-b
Copy link
Contributor

d-v-b commented Aug 26, 2025

conceptually I don't think we want a zarr.core.metadata.io module. So far, the zarr.core.metadata module has been exclusively for definitions of the metadata classes, which is distinct from the routines for saving metadata documents to disk.

right now I think a better location would be zarr.core.group -- would that be possible?

@K-Meech
Copy link
Contributor Author

K-Meech commented Aug 26, 2025

Thanks @d-v-b - I think the only issue with this is it may introduce a circular import? zarr.core.group already imports multiple functions from zarr.core.array, and zarr.core.array would need to import these common metadata saving functions from zarr.core.group.

I could move those functions into zarr.core.array instead? _build_parents used to be in that file, and was imported by both zarr.core.array and zarr.core.group (although only by putting an import inside that function to avoid another circular import 😅 )

@d-v-b
Copy link
Contributor

d-v-b commented Aug 26, 2025

very good points @K-Meech. Now I do think we should have zarr.core.metadata.io. But I don't think it should depend on AsyncArray or AsyncGroup. Looking at _build_parents, all it does is save metadata. I don't think this class needs to use AsyncGroup for that, we can just write the metadata out directly without that class.

In fact, we might not need the metadata-saving logic in _build_parents, since it's being called by another routine that's already creating metadata (save_metadata). IMO _build_parents should just decide where metadata documents need to be created, and then return a {name: GroupMetadata} dict with the names of the groups that need to be created by the calling function (e.g., save_metadata).

I suspect all of this could be handled by create_hierarchy. In any case, that function should also be moved to zarr.core.metadata.io (i'm happy to do this in another PR if you don't want to)

edit: I mistakenly linked to the create_hierarchy method defined on the AsyncGroup class. I meant to link to the stand-alone function:

async def create_hierarchy(

@K-Meech
Copy link
Contributor Author

K-Meech commented Aug 26, 2025

Thanks @d-v-b - sounds like a good plan. I'll look into removing the dependency on AsyncArray + AsyncGroup.
I'm away for the next few weeks, but can look into this + any other comments left on the PR in the meantime once I'm back.

@github-actions github-actions bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Sep 18, 2025
@K-Meech
Copy link
Contributor Author

K-Meech commented Sep 18, 2025

@d-v-b I've updated the code now to match your suggestion above. The dependency on AsyncArray / AsyncGroup is removed, and _build_parents now returns a dict of {name: GroupMetadata} that save_metadata then creates.

I agree that create_hierarchy could probably handle much of this functionality, but perhaps it's best to change this in a separate PR to keep the changes minimal here? What do you think?

@d-v-b
Copy link
Contributor

d-v-b commented Sep 18, 2025

I agree that create_hierarchy could probably handle much of this functionality, but perhaps it's best to change this in a separate PR to keep the changes minimal here? What do you think?

That works for me!


@pytest.mark.asyncio
async def test_async_oindex_with_zarr_array(self, store):
z1 = zarr.create_array(store=store, shape=(2, 2), chunks=(1, 1), zarr_format=3, dtype="i8")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did we need to change this test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test creates an array as a child of another array, so would throw an error (I changed it to create a group then put both arrays inside instead).

E.g. running the equivalent for a local store, shows an array inside an array:

store = LocalStore("data/images/arr")
z1 = zarr.create_array(store=store, shape=(2, 2), chunks=(1, 1), zarr_format=3, dtype="i8")
z1[...] = np.array([[1, 2], [3, 4]])
async_zarr = z1._async_array

# create boolean zarr array to index with
z2 = zarr.create_array(
    store=store, name="z2", shape=(2,), chunks=(1,), zarr_format=3, dtype="?"
)
z2[...] = np.array([True, False])

Copy link
Contributor

@d-v-b d-v-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you!

@d-v-b d-v-b merged commit de94764 into zarr-developers:main Sep 18, 2025
29 checks passed
@K-Meech K-Meech deleted the km/child-of-array branch September 19, 2025 08:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Silent failure when creating an array where there is an existing node
2 participants