Skip to content

RFC: Reformat GeoZarr as a registration of Zarr translations of well-supported open standards and extensions #67

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

maxrjones
Copy link
Member

@maxrjones maxrjones commented May 26, 2025

This is a large and half-developed PR, but I wanted to open it anyways to share my preferred vision for GeoZarr. In particular, this PR represents the start of two steps I proposed in #63 (comment):

  • Translate NetCDF CF conventions v1.12 to match Zarr's data/metadata structure and parlance
  • Define a lightweight standard for defining which conformance class is used at the root-level metadata of the structure.

The current version of the CF translation is available at https://maxrjones.github.io/geozarr-spec/cf-conventions.html and the definition of standard conformance is available at https://github.com/maxrjones/geozarr-spec/blob/simple-translation/geozarr-spec.md (although the location of the extension definition may need to be in attributes rather than extensions depending on the outcome from zarr-developers/zeps#67).

Background

As I mentioned in #63 (comment) and probably else-where, I do not agree with the path of developing OGC GeoZarr as a harmonization of OGC and CF. I instead encourage developing GeoZarr as:

  1. A translation of existing standards for compatibility with the Zarr specification V2 and V3, including Climate and Forecast Conventions and the Open Geospatial Consortium GeoTIFF and COG standards.
  2. A registration point for defining which primary convention and version is used by the Zarr store.
  3. A lightweight extension point for defining extensions that conform to the primary convention.

My primary reasons are as follows:

  • Defining a new data model under-utilizes the hard work that has gone into existing standards and risks delaying progress towards GeoZarr 1.0. Given ongoing EOPF development and active interest by other projects, I do not think that 2026 at the earliest for a v1 release is soon enough for success (xref 1.0 release date? #66). In contrast, defining GeoZarr as a translation of existing standards requires a lot of grunt work up front, but fewer new decisions which will likely make the overall process faster.
  • Defining a new "harmonized" data models would increase significantly the amount of work required by implementations.
  • The differences in basic concepts between existing standards would make a GeoZarr harmonization more ambiguous and error prone. For example, the GeoTIFF standard is predicated on the definition of a "cell" as "Rectangular area in raster space, in which a single pixel value is filled." whereas CF has a much broader definition "A region in one or more dimensions whose boundary can be described by a set of vertices recorded in boundary variables". The mix-and-match approach would make it much more difficult for users and implementations to quickly determine the meaning of important concepts such as a "cell".

Status

cc OGC SWG chairs @christophenoel @briannapagan

@maxrjones
Copy link
Member Author

@csaybar @rprinceley @mdsumner @rabernat I'd be especially grateful for your feedback on this idea as key participants in the discussion over in OSGeo/gdal#11824. I would be glad to answer any questions or schedule a chat if my PR description isn't clear or a more synchronous discussion would help.

Re-reading the discussion in OSGeo/gdal#11824, I think my proposal aligns with Even's first idea in OSGeo/gdal#11824 (comment):

you want to be able to fully replace netCDF usages with a 1-1 mapping. Then, easy, GeoZarr = adopting CF conventions unmodified (and potentially work with the group maintaining CF conventions to add support for "implicit coordinates"). At least this will make the life of some implementers easier (probably just the software doing the conversion)

with the addition that you could instead replace GeoTIFF/COG with a 1-1 mapping.

@christophenoel
Copy link

Hi @maxrjones

There is actually no major difference. What is proposed as a “harmonisation” of NetCDF, CF, and GeoTIFF is, in essence, a meta-model—precisely a translation of existing standards for compatibility, independent of source formats.

The key distinction with your approach lies in this meta-model’s ability to describe how different pieces of information relate, regardless of origin. For instance, it allows conversion from NetCDF while adding overviews.

This recent shift in direction is disappointing, as it continues to delay progress towards GeoZarr 1.0. Efforts would be better focused on progressing after merge of PR #65, by clearly defining the unified abstract model (e.g. overviews, affine transforms), followed by specifying the Zarr encodings of this model.

As for CF, the translation is straightforward. Since CF is defined around NetCDF/CDM constructs, a whole copy of the original convention adapted to Zarr is useless as only a limited mapping (see encoding chapter) is needed (if you map CDM to Zarr, you mostly mapped also CF...)

I acknowledge the efforts and work already invested, but yet another shift in direction makes me seriously consider stepping away from the process.

@maxrjones
Copy link
Member Author

There is actually no major difference. What is proposed as a “harmonisation” of NetCDF, CF, and GeoTIFF is, in essence, a meta-model—precisely a translation of existing standards for compatibility, independent of source formats.

This is great to hear!

This recent shift in direction is disappointing, as it continues to delay progress towards GeoZarr 1.0. Efforts would be better focused on progressing after merge of PR #65, by clearly defining the unified abstract model (e.g. overviews, affine transforms), followed by specifying the Zarr encodings of this model.

I wouldn't say this is a shift in direction, in part because it's just a proposal and request for community input and secondly due to your above comment that there's no major difference.

I acknowledge the efforts and work already invested, but yet another shift in direction makes me seriously consider stepping away from the process.

I'm sad to hear this. My preference would absolutely be to find a compromise in direction that leaves everyone satisfied that their concerns have been appropriately considered, keeps everyone engaged in the process, and leads to commitments in adopting GeoZarr.

@maxrjones
Copy link
Member Author

maxrjones commented May 27, 2025

The key distinction with your approach lies in this meta-model’s ability to describe how different pieces of information relate, regardless of origin. For instance, it allows conversion from NetCDF while adding overviews.

This proposed structure also allows for conversion from NetCDF while adding overviews. The primary difference is the registration of a primary standard while supporting additions like overviews via the extension mechanism. I proposed this approach for the following reasons:

  • By leveraging existing standards in their entirety, GeoZarr does not need to define compatibility across conformance classes, which I worry is an exploding problem. I.e., it provides a simple solution for the problem you mentioned in Call for Prototype/Implementation Owners for Different GeoZarr Conformance Classes #63 (comment) that "To maintain interoperability and simplicity, the number of combinations must remain limited. Excessive flexibility would increase complexity for applications and hinder standardisation efforts."
  • We would immediately support all features provided by CF and GeoTIFF (e.g., curvilinear grids, RCPs).
  • It makes it obvious which of the conflicting terminologies between CF and OGC the GeoZarr subscribes to (e.g., the definition of a cell), which can be ambiguous in the UDM approach.

@christophenoel
Copy link

  • GeoZarr does not need to define compatibility across conformance classes

From my point of view, conformance classes is just an optional bonus for identifying specific constructs (could also be made for NetCDF) -> e.G. specifying a standard structure for product with multiple resolutions. I see an interest, but it's probably not required in a first version.

For describing the information supported by GeoZarr we just need to define:

  • an abstract (meta) model: i.e. , how from the base CDM structure of groups, variables, attributes the other specs such CF (obvious as based on NetCDF model itself a tailoring of CDM), overviews, affine transform can plug)
  • define the encoding of these information to Zarr (or to other formats).

Everything is already there https://zarr.dev/geozarr-spec/documents/standard/template/geozarr-spec.htm , but we need progress for a consensus on:

  • how to describe the meta model correctly
  • how to host/define overviews (OGC tile matrix based, of COG based, ...)
  • How to host/define affine transform (CF based like proposed by ethan, Geotiff based, ...)
  • how to integrate STAC
  • how to integrate other interesting features

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants