-
Notifications
You must be signed in to change notification settings - Fork 0
Reading Notes
[This might get expanded out into different pages at a later date.]
This is a document to record notes and resources relating to the Climate-Adapt4EOSC Data Service (WP2), technical and cross-domain interoperability.
T2.3 Climate Data Refinery using common models UNIMAN will explore the use of RO-Crate for packaging of enriched climate data, building on specifications like EarthCube Geocodes, Science on Schema and RELIANCE Earth Observation Data Cubes to build a flexible EOSC Climate Metadata profile
Our ro-crate profile should faciliate the creation of ESDC.
Metadata required for a ESDC
| Data Descriptors | name |
| units | |
| resolution | |
| measurement methods | |
| equipment | |
| Data Transformations | resampling |
| interpolation | |
| Metadata transformations | date |
| reason | |
| responsible entity | |
| Responsible Producers | creator entity |
| Data provider |
This could be translated into 3 crate profiles: Data Description, Data Transformations and Responsible Producers. Where Data transformations could be built upon ro-crate workflow run crate. This would require domain knowledge.
Struggling to find a list of variable names in a controled vocabulary that can be used as an ID (URI/URL) in a crate profile.
There are many tools for processing Earth system data within the ESDC lifecycle, mostly in Python, R and Julia, with Python being the most used language for ESDC management. Tools described in the paper are not specific to ESDC, they are things like xarray, satpy and EOreader which are just useful for dealing with EO data.
NSF Earthcube program effort to better enable cross-domain discovery of and access to geoscience data and research tools. GeoCODES is made up of three components: Evolving standard = science on schema, set of tools = prototype portal to query data that have adopted science on schema, resource registry
At least one of the use cases uses sensor data, therefore our metadata schema should accomodate this.
Digital ecosystem for FAIR time series data management in environmental system science, Bumberger et al. (2025) SoftwareX
Includes a Sensor Management System (SMS) for detailed metadata registration and management, this might be useful for any real-time data from sensors. For example in the UC with a digital twin.
Code: cf-python
Metadata convention for climate and forecasting data. Working with NetCDF files.
Our metadata schema should faciliate the use of AI/ML , be machine-actionable.
Croissant
For datasets that will be used as input for Machine Learning, Croissant volcabulary could be used. This was mentioned by Slava at the T2.2 kick-off meeting 05/09/25. The schema is an extension of schema.org and includes key attributes and properties of datasets, as well as information required to load these datasets in ML tools.
DeepESDL
ESA’s Deep Earth System Data Laboratory, a platform providing analysis-ready data cubes in a powerful virtual environment for the Earth Science research community. DeepESDL offers a full suite of services to facilitate data exploitation, share data and source code, and publish results. Special emphasis is placed on supporting machine learning and artificial intelligence approaches, including preparation of AI-ready datasets, integrated programming environments, and scalable processing resources.
We should be thinking about how metadata is best suited to being queried by AI tools eg LLMs.
I think we should base our profiles on science-on-schema Dataset as ro-crate is based on schema.org and it looks like Dataset covers a lot of the attributes we need.
Metadata requirements for spatial data built upon ISO 19115 and 19119 standards. Legal obligation on public bodies to publish particular datasets, gov.uk.
Suggestions for extending the FAIR Principles based on a linguistic perspective on semantic interoperability,Vogt et al, (2025) Nature