ZEP10: Generic extensions proposal #67

joshmoore · 2025-05-16T14:43:08Z

This is a follow on to ZEP9 (#65) since #66 limits the scope of ZEP9 solely to phase 1 such that it can be moved to accepted (since zarr-developers/zarr-specs#330 is merged and v3.1 released). This ZEP is equivalent to phase 2 of the original ZEP9 draft and introduces a top-level generic extensions field.

This ZEP will follow the process laid out in ZEP0 and invites votes from the newly refreshed @zarr-developers/implementation-council. This PR may be proactively merged as a draft, but will not be moved to "accepted" until the related PR on zarr-specs is voted on, merged, and v3.2 released.

Please see zarr-developers/zarr-specs#344 for detailed changes.

alimanfoo · 2025-05-16T17:33:25Z

Hi @joshmoore, just a process question, it would seem beneficial to get this PR merged asap so it becomes visible as a draft zep on the zeps website. Who needs to approve that, and what checks would need to be done at this stage to allow merging? E.g., does someone just need to check that the document has the right structure for a ZEP? If so, I'd be happy to approve.

jbms · 2025-05-16T18:12:18Z

draft/ZEP0010.md

+
+
+
+#### Domain metadata (group)


This example goes against my own intuition regarding the extension vs. attribute distinction. I assumed that extensions would be for things that require integration into the zarr implementation itself (i.e. zarr-python, tensorstore, zarrs) while attributes are for things that are handled by higher level layers, either "user" code or some higher-level abstraction like ome-zarr built on top of zarr.

Using extensions essentially just as namespaced attributes means that either:

each implementation needs to have an implementation of the extension just to allow access to the attributes; or

implementations need to allow users to directly read and write must_understand: false extension metadata.

I updated the example, @jbms, but didn't fully address your question. I do see code extensions, e.g., ome-zarr having access to the extension metadata as one of three general options:

in implementation

in plugin (i.e. internal entrypoint)

in wrapper (i.e. higher-level)

jbms · 2025-05-16T18:14:07Z

Hi @joshmoore, just a process question, it would seem beneficial to get this PR merged asap so it becomes visible as a draft zep on the zeps website. Who needs to approve that, and what checks would need to be done at this stage to allow merging? E.g., does someone just need to check that the document has the right structure for a ZEP? If so, I'd be happy to approve.

I know we've done that in the past for ZEPs but then it is actually harder to comment on it --- I'd need to open a separate issue for each comment..

jbms · 2025-05-16T18:17:32Z

draft/ZEP0010.md

+    "extensions": [
+        {
+            "name": "example.array-statistics",
+            "must_understand": false,


Here we run into the issue that we may want to distinguish between reading and writing. Reading the array without understanding the extension is fine, but writing to the array would invalidate the statistics. Perhaps we should take the opportunity here to introduce must_understand_for_reading and must_understand_for_writing. An extension marked plain must_understand: false would seem to be serving the same purpose as an attribute.

Remembering your previous comment to this (...somewhere), I did consider it. I'm up for trying to include it now, but it will push the timeline, so there's a trade-off. As long as we're discussing it here, though, I was wondering if perhaps we don't make it an (extendable?!) object: must_understand: {"read": true, "write" false}.

I realized also that there is some additional complexity here.

For storing these min/max values, I could imagine a common strategy would be to first write the array, then compute the min/max statistics. If the array gets updated later, then the min/max values would need to be recomputed afterwards. If the user is resizing and then writing to a new portion of the array, only the newly-written part needs to be examined to recompute the min/max. If the user modifies an existing portion of the array, though, then the entire array would need to be examined. Or maybe the user is okay assuming the min/max bounds have only increased, and therefore only the modified portion needs to be examined.

If this example.array-statistics extension is somehow marked "must understand for writing", then an existing zarr implementation would refuse to write to it. But in fact it is fine for an existing implementation to write to it as long as the statistics are updated later, possibly through some separate batch pipeline separate from the zarr implementation itself. That suggests that implementations should support some sort of override -- a list of extensions that can safely be ignored. For this type of extension, it would also be useful for implementations to provide an interface for directly modifying extension metadata just like attributes.

jbms · 2025-05-16T18:21:39Z

draft/ZEP0010.md

+This alternative would continue to reserve the top-level namespace for changes to
+the core spec and, therefore, reduce pollution of the top-level namespace. Downsides include
+that only a single use of each extension would be possible since the key is the extension
+name and there would be no ordering of the extensions.


Since lack of an ordering is listed as a disadvantage, what do you see as a use case for ordering of extensions or listing the same extension more than once?

joshmoore · 2025-05-16T18:47:24Z

@alimanfoo a process question, it would seem beneficial to get this PR merged asap so it becomes visible as a draft zep on the zeps website. Who needs to approve that, and what checks would need to be done at this stage to allow merging? E.g., does someone just need to check that the document has the right structure for a ZEP? If so, I'd be happy to approve.

For merging in the "Draft", yes, that suffices. From https://zarr.dev/zeps/active/ZEP0000.html#submitting-a-zep

"...The Zarr Steering Council and the Zarr Implementations Council will not unreasonably deny publication of a ZEP. Reasons for denying ZEP include duplication of effort, being technically unsound, not providing proper motivation or addressing backwards compatibility, or not taking care of Zarr CODE OF CONDUCT."

@jbms I know we've done that in the past for ZEPs but then it is actually harder to comment on it --- I'd need to open a separate issue for each comment.

I'm certainly all for leaving it open for a bit, especially for the discussion of the material that is only here (as @jbms has done above). I can manage having it open and synchronizing with the specs PR. That being said, if possible, I'd like to get it merged as a "Draft" and then will also keep updating it as necessary to stay in step with discussions on zarr-developers/zarr-specs#344

d-v-b · 2025-05-16T18:49:38Z

Hi @joshmoore, just a process question, it would seem beneficial to get this PR merged asap so it becomes visible as a draft zep on the zeps website. Who needs to approve that, and what checks would need to be done at this stage to allow merging? E.g., does someone just need to check that the document has the right structure for a ZEP? If so, I'd be happy to approve.

seconding @jbms, I rate the ability to discuss the ZEP as a single PR much higher than seeing it listed on the ZEP web site, so I would rather we keep this PR open until it's clear that all the questions have been answered.

d-v-b · 2025-05-16T18:59:18Z

draft/ZEP0010.md

+(the default) and the implementation does not support it, the dataset must not
+be loaded and an appropriate error should be raised. For extensions with


First, does dataset mean array | group, or just array? In any case it might be good to avoid the term "dataset" because hdf5 uses that to describe what zarr calls arrays.

Second, what does it mean to "load a dataset" in this context? I think it should be safe for implementations to access chunks / metadata documents in a read-only mode, and it seems like it should be safe for any implementation to open any metadata document in a mutable mode, but insofar as the metadata document provides essential instructions for interpreting chunks, then it seems like the chunks should be mutable if and only if all the the metadata in the metadata document can be understood. But these are just my impressions, it would be good to see a more formal breakdown of these conditions.

An extension may affect the interpretation of the array in an important way. If an implementation opens the array without returning an error, then users may end up using incorrect data without realizing it. For example, there might be an extension that transposes or otherwise affects the indexing of the entire array. Failing by default is very important, but implementations could provide an option for the user to specify a list of extensions that may be safely ignored.

an extension-ignorant Zarr implementation could safely read a chunk as an opaque sequence of bytes, e.g., for the purpose of applying an additional level of compression on top of the chunk data... unless there is an extension that says that this type of compression is disallowed... In theory this can get rather complex, even if it wouldn't in practice. But I do think we need to define what "open a dataset" means here. It would be very odd (and easy to circumvent) if zarr-python was required by the spec to not read chunks as opaque bytes because of some field in a metadata document.

For example, there might be an extension that transposes or otherwise affects the indexing of the entire array.

Since this type of transformation can be expressed as a codec, it seems problematic it can also be defined via an extension.

That can't be expressed as a codec currently because codecs only apply to individual chunks.

However, your comment did give me an idea to write up something I'd thought about a bit previously...

zarr-developers/zarr-specs#346

A "generic extension" as proposed here is a way to gain implementation experience with a "formal extension" before actually adding it to the zarr spec.

is that the goal of this proposal? because there are other ways to do this without adding a new key to zarr metadata.

Personally I indeed would favor just using top-level metadata fields with namespaced names to avoid conflicts with metadata fields that are later added to the standard. That would avoid a syntactic mismatch between pre-standard and standard extensions.

Additionally an extension might alter or remove some existing standard attributes --- for example you might have an inline extension that specifies the array data inline in the metadata, in which case chunk_grid and codecs and fill_value are no longer specified. Or you might have a fill_value_array extension that allows specifying a non-uniform fill value (with broadcasting), in which case the normal fill_value field is excluded.

Syntactically isolating non-standard extensions all in an extensions array doesn't exactly fit with the fact that they may have unbounded effects.

For extensions that are really just namespaced attributes, putting them in an extensions array makes more sense but just keeping them in attributes would make even more sense in my opinion.

Syntactically isolating non-standard extensions all in an extensions array doesn't exactly fit with the fact that they may have unbounded effects.

this is one of my big concerns with this proposal. If we consider a field like data_type, the scope is very narrow. It has very limited interaction with the other metadata fields. Ideally it would have none! by contrast, the proposed extensions field has unlimited scope, and can express any relationship with any other metadata field. What are we getting for this complexity?

The main goal of this ZEP is to allow general purpose extensions that are not limited by the scopes of existing extension points. These extensions are supposed to allow for a (mostly) decentralized evolution of the spec, meaning that no heavy-weight process is required to add functionality or alter behavior of arrays and groups.
We want people in the community to experiment. If an extension has enough traction among implementations and users it might become the basis for a new core zarr feature. There might also be a number of garbage extensions that go nowhere.

Personally I indeed would favor just using top-level metadata fields with namespaced names to avoid conflicts with metadata fields that are later added to the standard. That would avoid a syntactic mismatch between pre-standard and standard extensions.

Syntactically isolating non-standard extensions all in an extensions array doesn't exactly fit with the fact that they may have unbounded effects.

I think that would also be a reasonable choice. I almost think this boils down to aesthetics.

Additionally an extension might alter or remove some existing standard attributes --- for example you might have an inline extension that specifies the array data inline in the metadata, in which case chunk_grid and codecs and fill_value are no longer specified. Or you might have a fill_value_array extension that allows specifying a non-uniform fill value (with broadcasting), in which case the normal fill_value field is excluded.

It seems fine if an extension would mandate that existing core attributes be ignored. It might be a bit weird and could lead to composability issues. But we don't want to limit experimentation at this point.

For extensions that are really just namespaced attributes, putting them in an extensions array makes more sense but just keeping them in attributes would make even more sense in my opinion.

This is an interesting point and I think we need some more discussion around that.

d-v-b · 2025-05-16T19:05:23Z

draft/ZEP0010.md

+associated with a specific (array) metadata field. Additional extension points
+may be added by future ZEPs. Until that time, however, third-parties may want
+to add arbitrary extension objects to either arrays or groups. This proposal
+introduces a generic ``extensions`` field that serves as a container for such a


why is the term "generic" used here? We are talking about extensions to array and group metadata documents, and therefore extensions to the array and group models. I would expect that array extensions will have very different constraints vs group extensions (like the question of inheritance, which only applies to groups), so it might be simpler and more direct to introduce that dichotomy early on, instead of framing this as if there's something "generic" about arrays and groups.

I think we call them "generic" because they don't fit in the other extensions points (e.g. codecs, chunk_key_encoding, ...).

it might be simpler to denote these extensions as "array extensions" and "group extensions". That to me much more clearly conveys the thing being extended.

That wouldn't be consistent with our use of "extension" elswhere in the spec document.

Then it's possible that the word "extension" is used ambiguously in the rest of the spec document as well. In any case, one assumes that the contents of extensions for an array might be different that content of extensions for a group. So I would probably use terms like "array extensions" and "group extensions" to describe these two conditions. By contrast on its face "generic extensions" conveys nothing to me -- new codecs could also be "generic" extensions.

I think the term "extension" is pretty well defined here: https://zarr-specs.readthedocs.io/en/latest/v3/core/index.html#extensions

Right now there are only extensions that fit in one of the extension points. With this ZEP we are permitting additional extensions that are not limited in scope. That is why we call them "generic".
Under that definition, codecs are not generic extensions because they fit in the codecs extension point.

In any case, one assumes that the contents of extensions for an array might be different that content of extensions for a group.

It is true that some extensions may only apply to arrays or groups. That would need to be denoted in the respective extension spec. However, fundamentally and syntactically, and therefore for the purpose of this ZEP, I don't think we need to treat them differently.

draft/ZEP0010.md

d-v-b · 2025-05-16T19:17:05Z

draft/ZEP0010.md

+Note that in this example of the extension is ``must_understand=true`` meaning
+an implementation which does not support the ``example.offset`` extension
+should raise an error.


when should that error be raised? when reading metadata, or when reading chunks?

If the impl doesn't know the example.offset extension, it must fail when parsing the metadata.
It may fail with a out-of-bounds error when reading/writing data outside the domain. But that would be up to the specification for this extension to define.

If the impl doesn't know the example.offset extension, it must fail when parsing the metadata.

It seems to me that a zarr-compatible application should be able to say, for example, "this is an array with shape <shape>, but I can't load chunks for you because of <unknown extension>". Your suggesting that the metadata document should be effectively unreadable prevents this.

It seems to me that a zarr-compatible application should be able to say, for example, "this is an array with shape <shape>, but I can't load chunks for you because of <unknown extension>".

I think that would be a good implementation.

I think that would be a good implementation.

Since the behavior I described relies on reading the metadata without an error, this PR should clarify the distinction between reading metadata documents and other IO operations (e.g., reading chunks, in this example).

If you are purely displaying information to a user and including a warning that an unknown extension was encountered, then displaying whatever information can be heuristically extracted from the metadata successfully may be reasonable.

In general though if there is an unknown extension, you can't really make any assumptions about the meaning of the metadata and any programmatic use is problematic.

For example, the offset extension may mean that the upper bound of the array is no longer indicated by shape but by offset + shape, and the chunk grid starts at offset rather than (0, ...). Maybe there is some program that partitions zarr arrays according to the chunking and then hands off those zarr arrays to worker processes. If the partition program does not support the offset extension, but the worker program does support the offset extension, then the partition program will perform the partitioning incorrectly, but the worker processes may process them without errors, but not correctly aligned to the chunk grid.

Concretely, I'd say that if there is an unknown must_understand=true extension, zarr.open and similar interfaces should not appear to succeed and allow querying properties like the chunk grid, dtype, etc. unless the user explicitly opts into ignoring unknown extensions.

In general though if there is an unknown extension, you can't really make any assumptions about the meaning of the metadata and any programmatic use is problematic.

I find this outcome concerning, as it amounts to fragmenting the zarr ecosystem.

draft/ZEP0010.md

d-v-b · 2025-05-16T19:20:03Z

I think this document should explain why the pre-existing attributes field is insufficient for the purposes of this ZEP.

draft/ZEP0010.md

d-v-b · 2025-05-16T19:35:11Z

draft/ZEP0010.md

+
+## Proposal
+
+To provide for more flexible, immediate, and de-centralized use cases, we


What I would find persuasive here would be a concrete example of something that people are trying to do with Zarr today that is blocked by the lack of the extensions field.

I think the example section has a few examples.

I am familiar with the examples in the examples section. They are not concrete examples of something that people are trying to do with Zarr today that is blocked by the lack of the extensions field.

My feedback here is that this proposal should open with a real, concrete example that people will understand. I could not explain to someone what "flexible, immediate, decentralized use cases" means. By contrast, I could explain to someone what xarray, or nasa-gesdic, or ome-zarr are doing.

To make it more explicit, I would take one of these examples

xarray's zarr encoding

ome-zarr

chunk accumulation zep

introduce in a small number of words how they are using zarr, and then explain why the extensions field would solve a real problem for them. This would make it clear that this ZEP is an attempt to solve a real problem that people actually have, unlike the current reference to "flexible, immediate, decentralized use cases".

I am familiar with the examples in the examples section. They are not concrete examples of something that people are trying to do with Zarr today that is blocked by the lack of the extensions field.

I disagree. All of these examples are concrete, but simplified, extension proposals that people have been floated by the community.

the feedback you are getting is that, from my POV, those examples are not concrete enough, and are too simplified. I personally do not see value in changing the zarr spec to solve a hypothetical, simplified problem. Stating a real problem, and then showing how this proposal will solve that problem, is convincing.

jbms · 2025-05-16T19:35:14Z

I think this document should explain why the pre-existing attributes field is insufficient for the purposes of this ZEP.

For must_understand: true extensions, like specifying the array content inline, transposing the array, etc. an attribute would definitely not work. However, all of the examples given would work as attributes reasonably well.

draft/ZEP0010.md

Co-authored-by: Davis Bennett <[email protected]>

draft/ZEP0010.md

d-v-b · 2025-05-22T19:18:13Z

I think this document should explain why the pre-existing attributes field is insufficient for the purposes of this ZEP.

For must_understand: true extensions, like specifying the array content inline, transposing the array, etc. an attribute would definitely not work. However, all of the examples given would work as attributes reasonably well.

to be clear, the specific thing that would not work if all extensions were in attributes is that we could not prevent non-compliant implementations from accessing data. Extension-compliant implementations on the other hand would have no trouble reading extensions from attributes.

This makes me wonder: how important is it really to exclude non-compliant implementations from accessing (and possible misinterpreting) data? I.e., how much weight should we assign to this feature. Are there real examples of negative outcomes from misinterpreting specialized zarr data? Or is this purely hypothetical?

joshmoore · 2025-05-24T06:24:25Z

Thanks for the feedback, all. I've pushed a number of clarification commits based on them, and tried to resolve the threads appropriately. I have ideas on further examples (esp. encryption as recently discussed on Zulip), but I'd very much welcome any others that may be floating around (as PRs, comments, etc.)

There are a few remaining conversations:

extensions array vs. one of the alternative
clarifying boundaries between attributes and extensions usage
advanced must_understand semantics

These may be easier on a call to work toward consensus rather than extended back and forth here. Since the previous ZEP meeting spot was cancelled, I'd suggest we start with a one-off. Finding time this coming week (May 26+) may be difficult but two options are:

June 2, 2025 – 20:00–21:00 CEST
June 4, 2025 – 20:00–21:00 CEST

I'd still also like to encourage other implementer voices, @zarr-developers/implementation-council. To ensure everyone feels comfortable contributing, it might be helpful for those who have already shared their perspective to give others space to chime in without feeling the need to immediately respond or defend their thoughts points.

LDeakin · 2025-05-25T00:01:19Z

Thanks Josh and Norman, this looks pretty great! My thoughts based on the PR and comments so far:

An extension array seems the most flexible, as it permits ordered / repeated extensions
The distinction between extensions and attributes is certainly is not as simple as automated vs human. The way I see it:
- An extension may affect chunk operations, metadata parsing, store operations, and other fundamental Zarr functionality in unforeseen ways. Such an extension requires support from a Zarr implementation like zarr-python, tensorstore, zarrs, etc.
- An attribute may change how data/metadata is interpreted, and would be the responsibility of downstream libraries (like ome-zarr-py), but implementations could support them too. These would fit under the banner of ZEP04.
I'm conflicted on isolating reading/writing with must_understand, but leaning to keeping it true/false because
- must_understand: true/false is backwards-compatible
- must_understand: true clearly means an implementation must support reading/writing
- must_understand: false in an extension implies to me that an implementation should support reading, but should not write unless it is actually aware of the extension and knows that it is okay to do so
  - An extension that never needs to be understood for reading or writing seems like it should just be an attribute

jbms · 2025-05-25T05:01:57Z

Can someone give an example of how order-dependent extensions might be used/specified?

It seems to me that it would be potentially confusing to have most extensions be order-independent but in certain cases have order dependence. It also seems like it would be quite challenging to specify the order-dependent behavior unless you can map the extension to some composable "interface" like a codec or storage transformer. But if there is such a composable interface, the extension should just be defined as specifying a list of "things" that conform to that interface, and that interface becomes a new extension extension point, e.g. {"name": "my_new_interface_list", "configuration": {"list": [{"name": "my_new_interface_item", "configuration": {...}}...]}}

maxrjones · 2025-05-26T21:32:47Z

draft/ZEP0010.md

+By adding a new field, the specification can assert restrictions that if added
+to ``attributes``. would amount to a breaking change. If present, the


I don't understand "By adding a new field, the specification can assert restrictions that if added
to attributes. would amount to a breaking change". I think an example could help a lot for people (like me) who've fallen behind in tracking spec/ZEP progress.

The contract of "attributes" has been, 'you can put anything here'. The definition of extensions say that they follow a form and their names are registered in https://github.com/zarr-developers/zarr-extensions. Breaking here means that it's certainly fair to assume that there is V3 data today with attributes that would violate these assumptions.

maxrjones · 2025-05-26T21:36:56Z

draft/ZEP0010.md

+The ``example.offset`` extension contains an array of the same order as the
+shape of the containing array specifying which element of the array should be
+considered as the origin, e.g., `[0, 0]`. This allows the reuse of subregions
+of an array without the need to rewrite the data.


If someone were to come across a must_understand=True extension in a zarr.json in the wild, how would they find the information necessary for supporting it in their Zarr implementation?

Lookup the name in zarr-extensions, and then read the documentation associated there.

Is it possible to make this ZEP self-describing, so that a user knows where to find that information without needing to also read the Zarr V3 spec?

Hmmm.... I'm not sure I understand your question, @maxrjones.

maxrjones · 2025-05-26T21:40:59Z

draft/ZEP0010.md

+
+### Alternatives for the `extensions` extension point
+
+The current design allows having the same


I don't understand "This MAY happen as
part of the core spec adopting functionality of an extension.". Can you provide more details about the motivation for this warning?

Let's use the "offset" example. Currently, someone could grab the zarr-extensions name and start building that extension. In the future, another ZEP could codify the name "offset" as top-level metadata, independently of the fact that it would then already be an extension.

why would the Zarr community allow codifying the name "offset" as top-level metadata in this case, since it seems that the functionality could be implemented by the extensions?

I can retroactively give you an example. In Zarr v2, the dimension separator was a Store concern in zarr-python. While working on the benchmark in https://www.nature.com/articles/s41592-021-01326-w, I realized that for OME-Zarr we were going to need to require / for performance reasons, and there was no way to do that from the fileset alone. So I introduced dimension_separator into Zarr v2 as best I could. (Conceivably, you could call that v2.1.)

It took me 6 months to get that work done which means my paper took that much longer to publish. Today, I would almost certainly have made an extension in order to not take on that burden immediately. It, however, very much makes sense in the main spec (as it is in v3) and so likely a follow up ZEP would have been wanted to elevate the dim separator "extension" to a full member of the spec.

maxrjones · 2025-05-26T21:50:10Z

draft/ZEP0010.md

+    ...,
+    "extensions": [
+        {
+            "name": "example.offset",


Is the example part of this name meaningful? I think it would be useful to either define in this document what the naming conventions are or link to the relevant external convention.

Sorry, that's from that "Extensions naming" section added by ZEP9.

maxrjones · 2025-05-26T21:56:11Z

draft/ZEP0010.md

+                "multiscale": {
+                    "datasets": [
+                        "path/to/array/1",
+                        "path/to/array/2",
+                        "path/to/array/3"
+                    ]


Could you provide some details about why using extensions is better than using attributes for this purpose? I'm asking this because it seems to serve a similar purpose as the NGFF spec or proposed GeoZarr spec which have been building an attributes-centric approach.

as an example of the motivation for this question and the Zarr v3 restriction, I'm trying to understand how this could be used for GeoZarr (if the standards working group decides to use this mechanism) which needs to support both Zarr formats. An example following my current understanding is drafted at https://github.com/maxrjones/geozarr-spec/blob/simple-translation/geozarr-spec.md#defining-primary-geospatial-convention.

This example is trying to outline how those two communities might move their common data structures into a shared space. GeoZarr's similar structure was initially taken from zarr-developers/zarr-specs#50. Had we had the central registry at that time, we might have developed it in common.

the value of a central registry seems like a separate question from why using extensions is better than using attributes.

This example is trying to outline how those two communities might move their common data structures into a shared space. GeoZarr's similar structure was initially taken from zarr-developers/zarr-specs#50. Had we had the central registry at that time, we might have developed it in common.

Does this mean that ZEP009 supercedes ZEP004?

the value of a central registry seems like a separate question from why using extensions is better than using attributes.

My guess based on these discussion is that an extensions key allows the Zarr community to watch for naming conflicts between communities, serving a similar purpose as ZEP0004 but with a different mechanism. Is this correct?

As the person who originated ZEP 4, I have come to feel it wasn't fully baked. I feel like the mechanism for identifying if a convention is present proposed there is quite flaky. I much prefer the STAC approach, i.e. explicit extensions, which is closer to what is being described here.

As the person who originated ZEP 4, I have come to feel it wasn't fully baked. I feel like the mechanism for identifying if a convention is present proposed there is quite flaky. I much prefer the STAC approach, i.e. explicit extensions, which is closer to what is being described here.

What I had in mind was explicitly registering individual top-level attribute names, e.g. tensorstore.dimension_units.

maxrjones · 2025-05-26T22:41:02Z

Is there anything in this proposal that motivates its restriction to Zarr specification 3 rather than both Zarr specification 2 and 3?

normanrz · 2025-05-27T07:09:29Z

Is there anything in this proposal that motivates its restriction to Zarr specification 3 rather than both Zarr specification 2 and 3?

At least from my pov, there is no desire to further evolve the v2 specification. Extensibility was one major motivation of the v3 specification. I think it would be confusing to continue evolving both.
In most cases, v2 data can be upgraded to v3 with metadata-only updates.

joshmoore · 2025-05-27T14:18:53Z

jbms (Jeremy Maitin-Shepard) 2 days ago
Can someone give an example of how order-dependent extensions might be used/specified?

I've not come up with a compelling one, @jbms. My intuition is that there would be a chance for one member of the pipeline (extA) to be able to update some state (the metadata?) before a later one (extB). Practically, though, I don't see how extA could know enough about extB to inject itself into the list at the right point. So other than "high-priority" extensions which add themselves at the beginning and "low-priority" ones which add themselves at the end, I still don't have a concrete example.

But, generally, 👍 on a general "SequenceExtension" style that others can adopt. Perhaps this speaks to a "generic extensions conventions" (or "idioms") section.

jbms · 2025-05-27T18:07:34Z

I agree with what @LDeakin said about must_understand --- must understand for writing should always implicitly be true and must_understand applies only for reading. That simplifies things nicely.

I am not in favor of using extensions as an attribute namespace for things like ome-zarr that are logically layered on top of zarr itself and don't require changes/deep integration with the zarr implementation, for several reasons:

zarr implementations intended for general use will have to provide an API for users to directly read and write extensions metadata very similar to attributes
we need to complicate must_understand to indicate reading and writing separately.
it muddies the distinction between things that change the core zarr model itself with things purely layered on top.

Instead we should add an attribute section to the zarr-extensions repo (related to the zarr conventions proposal). While technically this is a breaking change in that currently no part of the attribute namespace is reserved for registered names, in practice we could use some prefix or other naming convention for registered attributes such that conflicts with existing uses are very unlikely.

As I see it, extensions do have a high cost as far as fragmenting the ecosystem and therefore should be introduced with care, mostly for things that could reasonably be added to the core spec also.

draft/ZEP0010.md

Co-authored-by: Sanket Verma <[email protected]>

joshmoore added 8 commits May 12, 2025 21:53

Copy of ZEP9

e83d6b4

Laptop WIP

7176586

Desktop WIP

ba35fd6

Prepare individual examples

e3dbe93

Explanations and cleanup

d41b161

Wrap up first draft

c255d3f

Apply Norman's feedback on the examples

db7ae37

Introduce 'generic extensions' nomenclature

48c319c

joshmoore mentioned this pull request May 16, 2025

ZEP10: Generic extensions (v3.2 spec changes) zarr-developers/zarr-specs#344

Open

Update spec PR number

ffda708

jbms reviewed May 16, 2025

View reviewed changes

d-v-b reviewed May 16, 2025

View reviewed changes

draft/ZEP0010.md Outdated Show resolved Hide resolved

d-v-b reviewed May 16, 2025

View reviewed changes

draft/ZEP0010.md Outdated Show resolved Hide resolved

d-v-b reviewed May 16, 2025

View reviewed changes

draft/ZEP0010.md Outdated Show resolved Hide resolved

d-v-b reviewed May 16, 2025

View reviewed changes

draft/ZEP0010.md Outdated Show resolved Hide resolved

d-v-b reviewed May 16, 2025

View reviewed changes

draft/ZEP0010.md Outdated Show resolved Hide resolved

d-v-b reviewed May 16, 2025

View reviewed changes

draft/ZEP0010.md Outdated Show resolved Hide resolved

d-v-b reviewed May 16, 2025

View reviewed changes

draft/ZEP0010.md Outdated Show resolved Hide resolved

joshmoore and others added 2 commits May 16, 2025 16:07

Update ZEP0010.md

f4dea6f

Co-authored-by: Davis Bennett <[email protected]>

Update draft/ZEP0010.md

f4b3b2a

Co-authored-by: Davis Bennett <[email protected]>

LDeakin mentioned this pull request May 16, 2025

Tracking issue for Zarr spec, ZEPs, and extensions support zarrs/zarrs#191

Open

normanrz reviewed May 22, 2025

View reviewed changes

draft/ZEP0010.md Outdated Show resolved Hide resolved

joshmoore added 6 commits May 24, 2025 07:40

Intro clarification

b82d3d4

Remove opengl ref and clarify processing

a1495bb

Update multiscales example

dcc2246

Replace dataset with node

547c501

Clarify sub-nodes

dfe741b

Move extensions v attributes to the top

35087ba

maxrjones reviewed May 26, 2025

View reviewed changes

maxrjones mentioned this pull request May 27, 2025

RFC: Reformat GeoZarr as a registration of Zarr translations of well-supported open standards and extensions zarr-developers/geozarr-spec#67

Draft

9 tasks

sanketverma1704 reviewed May 28, 2025

View reviewed changes

draft/ZEP0010.md Outdated Show resolved Hide resolved

Update draft/ZEP0010.md

0cad1d1

Co-authored-by: Sanket Verma <[email protected]>

		(the default) and the implementation does not support it, the dataset must not
		be loaded and an appropriate error should be raised. For extensions with


		## Proposal

		To provide for more flexible, immediate, and de-centralized use cases, we

		By adding a new field, the specification can assert restrictions that if added
		to ``attributes``. would amount to a breaking change. If present, the


		### Alternatives for the `extensions` extension point

		The current design allows having the same

ZEP10: Generic extensions proposal #67

Are you sure you want to change the base?

ZEP10: Generic extensions proposal #67

Conversation

joshmoore commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alimanfoo commented May 16, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jbms commented May 16, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joshmoore commented May 16, 2025

Uh oh!

d-v-b commented May 16, 2025

Uh oh!

d-v-b May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

d-v-b May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

d-v-b May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joshmoore commented May 16, 2025 •

edited

Loading

d-v-b May 16, 2025 •

edited

Loading

d-v-b May 16, 2025 •

edited

Loading

d-v-b May 22, 2025 •

edited

Loading