Spec initial draft #65

christophenoel · 2025-04-24T11:33:09Z

To help move forward quickly and divide the work by topic, a draft structure was created using three main sections, following the typical OGC format:

Clause 7 - Unified Data Model explains the unified data model for GeoZarr (will need schemas and more information about the extensions )
Clause 8 - GeoZarr Conformance Classes sets out rules (conformance classes) for standardising different dataset types or profiles (raster, time series, sar, etc.) (this section could be replaced by examples/guidelines instead of conformance classes)
**Clause 9 - Unified Data Model ** describes how to encode the data using Zarr version 2 or version 3 (aims to be generic).

Each section is divided into multiple files (some of which have not yet been created) to ensure that separate pull requests (PRs) can be made for each future modification.

We may discuss in the next meeting whether to accept this PR as is (for the stucture only) before starting to create additional PRs, or whether to first conduct an initial review.

The latest Editor's Draft version of OGC GeoZarr Specificationis found here in HTML or PDF. This document is generated automatically from the repository.

…t, Scope, Conventions, Terminology, Requirements Classes, Overview, and References - Added Preface and Abstract aligned with GeoZarr charter purpose and value proposition. - Defined Scope section highlighting model-based architecture and interoperability objectives. - Introduced modular Requirements Classes with URIs (core, time, crs, geotransform, etc.). - Provided formal definitions in Terms and Abbreviated Terms aligned with data model terminology. - Adapted Conventions section for Zarr metadata, identifiers, link relations, and encoding usage. - Added Overview section summarising modular structure, encoding support, and metadata design. - Populated References section with normative citations (Zarr, CDM, NetCDF, CF, GDAL, STAC, etc.) following LNCS style.

rabernat

This is great progress @christophenoel! Thanks for your work!

I read through and left numerous comments. Hopefully this feedback is helpful.

rabernat · 2025-04-24T14:02:33Z

standard/template/sections/clause_2_conformance.adoc


-* AAAA
-* BBBB
+The *Core* requirements class defines the minimal compliance necessary to claim conformance with the GeoZarr Unified Data Model. It is intentionally open and permissive, supporting incremental adoption and broad compatibility with existing Zarr tools and data models based on the Unidata Common Data Model (CDM).


👍 well said

rabernat · 2025-04-24T14:04:32Z

standard/template/sections/clause_4_terms_and_definitions.adoc


-[example]
-Here's an example of an example term.
+A one-dimensional array whose values define the coordinate system for a dimension of one or more data variables. Typical examples include latitude, longitude, time, or vertical levels.


Would it be useful to link to CDM / CF conventions and / or state whether this definition is identical to those other definitions?

standard/template/sections/clause_4_terms_and_definitions.adoc

rabernat · 2025-04-24T14:05:11Z

standard/template/sections/clause_4_terms_and_definitions.adoc

+
+A container for datasets, variables, dimensions, and metadata in Zarr. Groups may be nested to represent a logical hierarchy (e.g., for resolutions or collections).
+
+==== metadata


Suggested change

==== metadata

==== attributes

This is what it's called in Zarr at least

From my point of view, metadata is a broader and more formal term (also used in NetCDF, GDAL, etc.) that encompasses all descriptive information about data—covering CF attributes, geospatial reference systems, temporal context, and links to external resources (e.g. STAC).

Attributes typically refer to key-value pairs attached to variables or groups in formats like NetCDF, or Zarr. They are a subset of metadata, and more implementation-specific.

Concretely, while metadata heavily rely on attributes, this also may be defined by construct on groups. And the metadata might be stored in a single attribute (JSON format), etc.

rabernat · 2025-04-24T14:06:06Z

standard/template/sections/clause_4_terms_and_definitions.adoc

+
+==== metadata
+
+Structured information describing the content, context, and semantics of datasets, variables, and attributes. GeoZarr metadata includes CF attributes, geotransform definitions, and links to STAC metadata where applicable.


Suggested change

Structured information describing the content, context, and semantics of datasets, variables, and attributes. GeoZarr metadata includes CF attributes, geotransform definitions, and links to STAC metadata where applicable.

Attributes are structured information describing the content, context, and semantics of datasets, variables, and attributes, i.e. metadata. GeoZarr attributes includes CF attributes, geotransform definitions, and links to STAC metadata where applicable.

rabernat · 2025-04-24T14:17:55Z

standard/template/sections/clause_9_zarr_encoding_core.adoc

+=== Coordinate Variables
+
+Coordinate variables (excluding GeoTransform Coordinates) define the geospatial or temporal context of data. They are represented as named arrays with metadata attributes.
+
+Coordinate variables are represented as named 1D arrays aligned with corresponding dimensions.


We should distinguish that coordinate variables are not part of the Zarr spec itself. It's from CDM.

From my perspective its clear, as there is a mapping (section) for each component of the data model to Zarr.
You will find the same sections to NetCDF, to GeoTiff, etc.

rabernat · 2025-04-24T14:18:58Z

standard/template/sections/clause_9_zarr_encoding_core.adoc

+    "title": "Example Dataset",
+    "summary": "Multidimensional Earth Observation data",
+    "institution": "Example Space Agency",
+    "Conventions": "CF-1.10"


Do we want to include a GeoZarr convention flag?

rabernat · 2025-04-24T14:20:23Z

standard/template/sections/clause_9_zarr_encoding_overviews.adoc

+[cols="1,2,2"]
+|===
+|Structure |Zarr v2 |Zarr v3
+
+|Zoom level groups | Subdirectories with `.zgroup` and `.zattrs` | Subdirectories with `zarr.json`, `node_type: group`
+
+|Variables at each level | Zarr arrays (`.zarray`, `.zattrs`) in each group | Zarr arrays (`zarr.json`, `node_type: array`) in each group
+
+|Global metadata | `multiscales` defined in parent `.zattrs` | `multiscales` defined in parent group `zarr.json` under `attributes`
+|===


Unnecessary to refer explicitly to Zarr format details here. Let's just talk about Groups and Arrays, which is universal to both format versions.

rabernat · 2025-04-24T14:21:02Z

standard/template/sections/clause_9_zarr_encoding_overviews.adoc

+|Global metadata | `multiscales` defined in parent `.zattrs` | `multiscales` defined in parent group `zarr.json` under `attributes`
+|===
+
+Each multiscale group MUST define chunking (tiling) along the spatial dimensions (`X`, `Y`, or `lon`, `lat`). Recommended chunk sizes are 256×256 or 512×512.


rabernat · 2025-04-24T14:22:40Z

standard/template/sections/clause_9_zarr_encoding_overviews.adoc

+
+- Chunks MUST be aligned with the tile grid (1:1 mapping between chunks and tiles)
+- Chunk sizes MUST match the `tileWidth` and `tileHeight` declared in the TileMatrix
+- Spatial dimensions MUST be clearly identified using `dimension_names` (v3) or `_ARRAY_DIMENSIONS` (v2)


Suggested change

- Spatial dimensions MUST be clearly identified using `dimension_names` (v3) or `_ARRAY_DIMENSIONS` (v2)

- Spatial dimensions MUST be clearly identified using the Array dimensions

Co-authored-by: Ryan Abernathey <[email protected]>

christophenoel · 2025-05-06T14:29:53Z

Don't know why it wasa closed

briannapagan

Thank you for kicking this off Christophe! Looking forward to discuss more.

briannapagan · 2025-05-07T04:28:19Z

standard/template/sections/clause_2_conformance.adoc

+|Supports encoding of data in projected coordinate systems and association with spatial reference metadata.
+|`http://www.opengis.net/spec/geozarr/1.0/conf/projected`
+
+|Spectral Bands


Are we becoming too specific by starting to spell out attributes for spectral bands?

I think it is too specific. Naming conventions for spectral bands is an issue across communities rather than just for Zarr. I think it would be better to define the standard outside GeoZarr and then provide references.

briannapagan · 2025-05-07T04:32:11Z

standard/template/sections/clause_4_terms_and_definitions.adoc

+
+==== tile matrix set
+
+A spatial tiling scheme defined by a hierarchy of zoom levels and consistent grid parameters (e.g., scale, CRS). Tile Matrix Sets enable spatial indexing and tiling of gridded data.


wondering if @abarciauskas-bgse or @maxrjones have any reflections on this.

briannapagan · 2025-05-07T04:35:09Z

standard/template/sections/clause_2_conformance.adoc

+|Specifies multiscale tiled layout using zoom levels and Tile Matrix Sets as per OGC API – Tiles.
+|`http://www.opengis.net/spec/geozarr/1.0/conf/overviews`
+
+|STAC Metadata Integration


Is someone currently owning the STAC integration?

maxrjones · 2025-05-07T13:54:15Z

standard/template/sections/clause_2_conformance.adoc

+|GeoTransform Metadata
+|Enables affine spatial referencing via GDAL-compatible `GeoTransform` metadata and optional interpolation hints.
+|`http://www.opengis.net/spec/geozarr/1.0/conf/geotransform`


I think we should start with a translation of the OGC GeoTIFF standard rather than basing GeoZarr v1.0 on GDAL's internal data structure (i.e., ModelTiepoint, ModelPixelScale, ModelTransformation).

I disagree on thiis point, and I think you're not referring GDAL data model, but the encoding in GeoTiff (ModelTiepoint, ModelPixelScale, ModelTransformation) which complexifies the 6 points of affine transform.

I think the current approach with the greatest support is the extension/adaptation of CF to support also affine transformation.

This is indeed complexifying on GDAL, when it's not just a bounding box it is sparse coordinates. Besides the special case of skew/rotation with six fig transform, Max has explained well how this is getting out of hand. Don't reinvent the warper API here 🙏

I don't understand why we would adopt an in-memory representation that only supports a subset of use-cases when we're dealing with serializing metadata and there is a long-established and widely used standard for serializing this information that works for a broader set of use cases. I'm arguing for GeoTIFF tags over GDAL.

I said similar here before I understood the landscape much: https://discourse.pangeo.io/t/example-which-highlights-the-limitations-of-netcdf-style-coordinates-for-large-geospatial-rasters/4140/16?u=michael_sumner I'll try to put together examples. Maybe I'm off base but tie points are a small case of non redundant coordinate arrays, it's not a raster so very cleanly out of scope imo.

I disagree on thiis point, and I think you're not referring GDAL data model, but the encoding in GeoTiff (ModelTiepoint, ModelPixelScale, ModelTransformation) which complexifies the 6 points of affine transform.

This is not true from my practical experience. I've been using the following code to convert GeoTIFF tags into a GDAL compliant affine transformation for 5+ years and it hasn't broken once (and no issues/bugs reported either).

gt = affine.Affine( self.ifds[0].ModelPixelScaleTag[0], 0.0, self.ifds[0].ModelTiepointTag[3], 0.0, -self.ifds[0].ModelPixelScaleTag[1], self.ifds[0].ModelTiepointTag[4], )

That's not to say that this will never break, there are definitely cases where this wouldn't work, some of these have already been covered in the thread:

@mdsumner mentions the skew/rotation case.

Images with multiple tie points - ModelTiePointTag. The original maptools GeoTIFF spec (not the OGC one) states that:

since the relationship between the Raster space and the model space will often be an exact, affine transformation, this relationship can be defined using one set of tiepoints and the "ModelPixelScaleTag", described below, which gives the vertical and horizontal raster grid cell size, specified in model units. If possible, the first tiepoint placed in this tag shall be the one establishing the location of the point (0,0) in raster space. However, if this is not possible (for example, if (0,0) is goes to a part of model space in which the projection is ill-defined), then there is no particular order in which the tiepoints need be listed. For orthorectification or mosaicking applications a large number of tiepoints may be specified on a mesh over the raster image. However, the definition of associated grid interpolation methods is not in the scope of the current GeoTIFF spec.

The key here is that the definition and interpretation of multiple tie points is not defined by the GeoTIFF spec. GDAL does have its own logic to interpret images with multiple (more than 6) tie points, I'm sure other libraries have slightly different logic. I believe we should be building GeoZarr to cover the highest surface area possible; not to cover all potential use cases. Because it is impossible to provide a representation of an affine transform that satisfies all potential edge cases that may appear across CF / GDAL / GeoTiff (and w/e other data formats people load into GeoZarr).

I think we should start with a translation of the OGC GeoTIFF standard rather than basing GeoZarr v1.0 on GDAL's internal data structure (i.e., ModelTiepoint, ModelPixelScale, ModelTransformation).

Back to the original question; I don't think it does anyone justice to debate whether or not we should be in compliance with GeoTIFF or GDAL's internal data model. The reality is the concept of an "affine transform" was developed over the course of the 18th-19th century and far pre-dates the notion of raster data, geotiff, or the GDAL data model. Let's just call it what it is! Translating to GeoZarr from both GeoTIFF and GDAL is valuable to see how well they map over.

I think the current approach with the greatest support is the extension/adaptation of CF to support also affine transformation.

Why do we have to extend CF to add support for affine transforms to GeoZarr? Why can't we just say "geozarr should use affine transforms, and this is how the affine transform should be structured".

This is not true from my practical experience. I've been using the following code to convert GeoTIFF tags into a GDAL compliant affine transformation

You just confirm that ModelTiepointTag, ModelPixelScaleTag are GeoTiff, not GDAL.

Yes basically the logic is:

If ModelTransformationTag exists → derive affine from matrix.

Else:

Use ModelTiepointTag for origin.

Use ModelPixelScaleTag for pixel size.

Adjust origin if PixelIsPoint (shift by ½ pixel).

What I'm arguing for is that the GeoZarr encoding should be formatted closer to GDAL syntax (below) than using GeoTiff attributes.

Affine = [a0, a1, a2, a3, a4, a5] Where: a0 = top left x (origin X) a1 = pixel width (scale X) a2 = rotation (typically 0) a3 = top left y (origin Y) a4 = rotation (typically 0) a5 = pixel height (scale Y, usually negative)

I'm in favor of the draft proposed by CF recently: https://github.com/orgs/cf-convention/discussions/411

Maybe this is just a semantics issue then. a0 a1 a2 (as you mention above) are not related to GDAL (or GeoTIFF). That is the definition of an affine transform, which GDAL happens to implement. While it's true to say this aligns with the GDAL data model, it's more accurate to say that "this is an affine transform".

To state that differently; GDAL did not come up with the idea of an affine transformation. The concept of an affine transform exists outside of GDAL/CF/GeoTIFF, or any of the other implementations that have been discussed here. Why not just call it what it is?

As I said in my earlier comments, I don't think it does the community any justice to spend our time debating whether or not an affine transform is more like GeoTIFF or more like GDAL when there is a lower level definition for an "affine transform" which existed long before either of these two implementations exist.

maxrjones · 2025-05-07T13:57:55Z

standard/template/sections/clause_2_conformance.adoc

+|Multiscale Overviews
+|Specifies multiscale tiled layout using zoom levels and Tile Matrix Sets as per OGC API – Tiles.
+|`http://www.opengis.net/spec/geozarr/1.0/conf/overviews`


I do not think a conformance class for OGC API - Tiles should be included in GeoZarr v1.0 because it may be more complicated than necessary. The community is requesting support for simple factor of two downsampling, which would be provided by a Zarr translation of the OGC COG standard, but not storing OGC TMS (e.g., https://cloudnativegeo.slack.com/archives/C06HCP0KAA2/p1746035679500189). So I think we should start with the COG-translated conformance class (cc @vincentsarago @geospatial-jeff)

@maxrjones from my point of view, this only means not reinventing the wheel, and in line with the OGC spec. This has been already mostly covered in draft clause_7d_format_pyramiding.adoc (and discussions such as : #30)

I do not think a conformance class for OGC API - Tiles should be included in GeoZarr v1.0 because it may be more complicated than necessary.

I strongly agree with this. While overviews are used mostly in geospatial use cases for visualization purposes; there is nothing inherently geospatial about overviews. There is not a specification for overviews as far as I'm aware, but I'd argue (geo)TIFF is the reference example for overviews/tiling. GeoTIFF only stores geospatial information (geotiff tags) on the IFD holding native resolution data. GeoTIFF does not store any spatial information on IFDs that contain reduced resolution overviews.

I believe the community simply needs more time to figure out how overviews (in the geospatial sense) apply more generally to the zarr data model, and pushing this OGC API - Tiles conformance class into GeoZarr before it's well thought through is not beneficial to anyone. Overviews are also not important enough to block the release of GeoZarr v1.0.

I'd much rather give the community something to build off of by releasing an initial GeoZarr v1.0 spec without overview support. Then see how the tooling evolves to support overviews against the broader zarr visualization use-case (through projects like https://github.com/carbonplan/ndpyramid). It's much easier to circle back to add a tiling conformance class after the fact than refactor an existing conformance class that wasn't well thought through to begin with 😄

A conformance class is only the advertisement (conformTo: overview) that overview are provided in the dataset. The abstract representation of overviews/multiscales should be defined in the abstract data model, and the concrete representation in the zarr encoding of the model.

The overviews/ multiscaling indeed would probably reuse exisitng specification (GeoTiff, or Tile Matrix Set doesn't matter which one wins). A few draft and demonstration have been proposed, so I don't se why the community would need more time: as any other topic, we should progress on this topic during the year, provide examples, and see what suits the best. The initial TMS based draft looks great to me: #44

maxrjones · 2025-05-07T14:09:41Z

This is great progress, thank you @christophenoel! I left some comments and general feedback about minimizing the initial scope in #63 (comment).

pzaborowski

great work

pzaborowski · 2025-05-08T08:42:07Z

standard/template/sections/clause_7_unified_data_model.adoc

+
+Each dataset node comprises the following core components, aligned with the Unidata Common Data Model (CDM) and Climate and Forecast (CF) Conventions:
+
+- **Dimensions** – Named, integer-valued axes defining the extent of data variables. Examples include `time`, `x`, `y`, and `band`.


is it extent or just length of dimension, while what is here coordinate variable defines extent (coverage domain)

pzaborowski · 2025-05-08T10:10:18Z

standard/template/sections/clause_7_unified_data_model.adoc

+
+- **Dimensions** – Named, integer-valued axes defining the extent of data variables. Examples include `time`, `x`, `y`, and `band`.
+- **Coordinate Variables** – Arrays that supply coordinate values along dimensions, providing spatial, temporal, or contextual referencing. These may be scalar or higher-dimensional, depending on the referencing scheme.
+- **Data Variables** – Multidimensional arrays representing physical measurements or derived products. Defined over one or more dimensions, these variables are associated with coordinate variables and annotated with metadata.


Maybe they are not part of the model below itself, but that is ok to distinguish them. Just them, it may be missing auxiliary variables like grid_mapping/crs, which do not have a reference to the dimension

pzaborowski · 2025-05-08T10:11:47Z

standard/template/sections/clause_7_unified_data_model.adoc

+Each dataset node comprises the following core components, aligned with the Unidata Common Data Model (CDM) and Climate and Forecast (CF) Conventions:
+
+- **Dimensions** – Named, integer-valued axes defining the extent of data variables. Examples include `time`, `x`, `y`, and `band`.
+- **Coordinate Variables** – Arrays that supply coordinate values along dimensions, providing spatial, temporal, or contextual referencing. These may be scalar or higher-dimensional, depending on the referencing scheme.


each 'coordinate variable' is associated with one dimension

pzaborowski · 2025-05-08T11:00:56Z

standard/template/sections/clause_7_unified_data_model.adoc

+    + String name
+    + int length
+    + boolean isUnlimited
+    + boolean isShared


does it make sense to distinuish optional parameters? I think it is one of them

pzaborowski · 2025-05-12T11:37:56Z

standard/template/sections/clause_9_zarr_encoding_core.adoc

+
+All metadata attributes (for groups, coordinates variables and data variables) are recommended to conform to CF naming and typing conventions. Supported attributes include:
+
+- `standard_name`, `units`, `axis`, `grid_mapping` (CF)


if grid_mapping is expected, auxiliary variables shall be as well

Initial draft of unified data model specification

28ee68b

christophenoel changed the title ~~Initial draft of unified data model specification~~ Spec initial draft Apr 24, 2025

rabernat reviewed Apr 24, 2025

View reviewed changes

christophenoel and others added 2 commits April 29, 2025 09:39

Update standard/template/sections/clause_4_terms_and_definitions.adoc

e8fdb2f

Co-authored-by: Ryan Abernathey <[email protected]>

Update standard/template/sections/clause_9_zarr_encoding_core.adoc

32337f7

Co-authored-by: Ryan Abernathey <[email protected]>

christophenoel closed this Apr 29, 2025

christophenoel deleted the spec_initial_draft branch April 29, 2025 08:00

christophenoel restored the spec_initial_draft branch April 29, 2025 09:03

christophenoel reopened this May 6, 2025

briannapagan reviewed May 7, 2025

View reviewed changes

maxrjones reviewed May 7, 2025

View reviewed changes

geospatial-jeff mentioned this pull request May 8, 2025

1.0 release date? #66

Open

pzaborowski reviewed May 12, 2025

View reviewed changes

christophenoel mentioned this pull request May 27, 2025

RFC: Reformat GeoZarr as a registration of Zarr translations of well-supported open standards and extensions #67

Draft

9 tasks

maxrjones mentioned this pull request Jun 4, 2025

Example metadata #78

Open


		A container for datasets, variables, dimensions, and metadata in Zarr. Groups may be nested to represent a logical hierarchy (e.g., for resolutions or collections).

		==== metadata


		==== metadata

		Structured information describing the content, context, and semantics of datasets, variables, and attributes. GeoZarr metadata includes CF attributes, geotransform definitions, and links to STAC metadata where applicable.

	Structured information describing the content, context, and semantics of datasets, variables, and attributes. GeoZarr metadata includes CF attributes, geotransform definitions, and links to STAC metadata where applicable.
	Attributes are structured information describing the content, context, and semantics of datasets, variables, and attributes, i.e. metadata. GeoZarr attributes includes CF attributes, geotransform definitions, and links to STAC metadata where applicable.

	- Spatial dimensions MUST be clearly identified using `dimension_names` (v3) or `_ARRAY_DIMENSIONS` (v2)
	- Spatial dimensions MUST be clearly identified using the Array dimensions


		==== tile matrix set

		A spatial tiling scheme defined by a hierarchy of zoom levels and consistent grid parameters (e.g., scale, CRS). Tile Matrix Sets enable spatial indexing and tiling of gridded data.


		Each dataset node comprises the following core components, aligned with the Unidata Common Data Model (CDM) and Climate and Forecast (CF) Conventions:

		- Dimensions – Named, integer-valued axes defining the extent of data variables. Examples include `time`, `x`, `y`, and `band`.


		All metadata attributes (for groups, coordinates variables and data variables) are recommended to conform to CF naming and typing conventions. Supported attributes include:

		- `standard_name`, `units`, `axis`, `grid_mapping` (CF)

Spec initial draft #65

Are you sure you want to change the base?

Spec initial draft #65

Uh oh!

Conversation

christophenoel commented Apr 24, 2025

Uh oh!

rabernat left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

christophenoel commented May 6, 2025

Uh oh!

briannapagan left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

geospatial-jeff May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maxrjones commented May 7, 2025

Uh oh!

pzaborowski left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

geospatial-jeff May 7, 2025 •

edited

Loading