Skip to content

Reading Notes

Emma Simpson edited this page Sep 10, 2025 · 3 revisions

[This might get expanded out into different pages at a later date.]

This is a document to record notes and resources relating to the Climate-Adapt4EOSC Data Service (WP2), technical and cross-domain interoperability.

Metadata

T2.3 Climate Data Refinery using common models UNIMAN will explore the use of RO-Crate for packaging of enriched climate data, building on specifications like EarthCube Geocodes, Science on Schema and RELIANCE Earth Observation Data Cubes to build a flexible EOSC Climate Metadata profile

Earth System Data Cubes

'Earth system data cubes: Avenues for advancing earth system research', Montero et al, 2024, Env. Data Sci.

Our ro-crate profile should faciliate the creation of ESDC.

Metadata required for a ESDC

Data Descriptors name
units
resolution
measurement methods
equipment
Data Transformations resampling
interpolation
Metadata transformations date
reason
responsible entity
Responsible Producers creator entity
Data provider

This could be translated into 3 crate profiles: Data Description, Data Transformations and Responsible Producers. Where Data transformations could be built upon ro-crate workflow run crate. This would require domain knowledge.

Struggling to find a list of variable names in a controled vocabulary that can be used as an ID (URI/URL) in a crate profile.

There are many tools for processing Earth system data within the ESDC lifecycle, mostly in Python, R and Julia, with Python being the most used language for ESDC management. Tools described in the paper are not specific to ESDC, they are things like xarray, satpy and EOreader which are just useful for dealing with EO data.

GeoCODES

NSF Earthcube program effort to better enable cross-domain discovery of and access to geoscience data and research tools. GeoCODES is made up of three components: Evolving standard = science on schema, set of tools = prototype portal to query data that have adopted science on schema, resource registry

Sensor data

At least one of the use cases uses sensor data, therefore our metadata schema should accomodate this.

Digital ecosystem for FAIR time series data management in environmental system science, Bumberger et al. (2025) SoftwareX

Includes a Sensor Management System (SMS) for detailed metadata registration and management, this might be useful for any real-time data from sensors. For example in the UC with a digital twin.

Climate and Forecasting Data Model

A data model of the Climate and Forecast metadata conventions (CF-1.6) with a software implementation (cf-python v2.1)

Code: cf-python

Metadata convention for climate and forecasting data. Working with NetCDF files.

Prepare data for AI/ML

Our metadata schema should faciliate the use of AI/ML , be machine-actionable.

Croissant

For datasets that will be used as input for Machine Learning, Croissant volcabulary could be used. This was mentioned by Slava at the T2.2 kick-off meeting 05/09/25. The schema is an extension of schema.org and includes key attributes and properties of datasets, as well as information required to load these datasets in ML tools.

DeepESDL

ESA’s Deep Earth System Data Laboratory, a platform providing analysis-ready data cubes in a powerful virtual environment for the Earth Science research community. DeepESDL offers a full suite of services to facilitate data exploitation, share data and source code, and publish results. Special emphasis is placed on supporting machine learning and artificial intelligence approaches, including preparation of AI-ready datasets, integrated programming environments, and scalable processing resources.

We should be thinking about how metadata is best suited to being queried by AI tools eg LLMs.

Science on Schema

I think we should base our profiles on science-on-schema Dataset as ro-crate is based on schema.org and it looks like Dataset covers a lot of the attributes we need.

INSPIRE

Metadata requirements for spatial data built upon ISO 19115 and 19119 standards. Legal obligation on public bodies to publish particular datasets, gov.uk.

FAIR principles

Suggestions for extending the FAIR Principles based on a linguistic perspective on semantic interoperability,Vogt et al, (2025) Nature

Clone this wiki locally