Skip to content

zarr.consolidated_metadata should not use previously consolidated (stale) metadata #2921

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jhamman opened this issue Mar 19, 2025 · 1 comment · May be fixed by #2980
Open

zarr.consolidated_metadata should not use previously consolidated (stale) metadata #2921

jhamman opened this issue Mar 19, 2025 · 1 comment · May be fixed by #2980
Labels
bug Potential issues with the zarr-python library

Comments

@jhamman
Copy link
Member

jhamman commented Mar 19, 2025

Zarr version

3.0.5

Numcodecs version

v0.15.1

Python Version

3.11

Operating System

Mac

Installation

conda/pip

Description

We're noticing some ugly behavior when re-consolidating a store with pre-existing consolidated metadata. If a child node in the store already has consolidated metadata in it, any subsequent consolidation attempts will fail to find children that were not present before the consolidation. Because there is no way to invalidate the consolidated metadata, it is not possible to consolidate the entire store.

Note: In #2920, we also noticed a distinct but related issue where the behavior of consolidate_metadata has changed to honor the path argument.

Steps to reproduce

store = zarr.storage.MemoryStore()

root = zarr.group(store=store, zarr_format=3)
root.create_group("foo")
zarr.consolidate_metadata(store, path="foo")
root.create_group("foo/bar/spam")
zarr.consolidate_metadata(store)

root = zarr.open_consolidated(store, zarr_format=3)
root.members(max_depth=100)
# (('foo', <Group memory://140600666693888/foo>),)

# expected
# (('foo', <Group memory://140600662927488/foo>),
#  ('foo/bar', <Group memory://140600662927488/foo/bar>),
#  ('foo/bar/spam', <Group memory://140600662927488/foo/bar/spam>))
@jhamman jhamman added the bug Potential issues with the zarr-python library label Mar 19, 2025
@TomAugspurger
Copy link
Contributor

I think I see where the bug comes from: In consolidate_metadata, we do use the flag use_consolidated=False in the top-level AsyncGroup.open. However, when we walk group.members() at

members_metadata = {k: v.metadata async for k, v in group.members(max_depth=None)}
, we eventually hit the (now stale) consolidated metadata at /foo.

Currently, Group.members has always respects consolidated metadata:

if self.metadata.consolidated_metadata is not None:

I believe the cleanest solution is to add a keyword use_consolidated: bool | None = None to Group.members with the same meaning as in Group.open. We'd ensure that consolidate_metadata passes through use_consolidated=False.

Implementation-wise, I think we could either try threading that through to the _iter_members / _iter_members_deep`, or just clear any consolidated metadata we might have loaded before we walk that Group's members.

@TomAugspurger TomAugspurger linked a pull request Apr 12, 2025 that will close this issue
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Potential issues with the zarr-python library
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants