Conversation
Just expect it to be provided in the datasets.json
All datasets are "example" datasets
With placeholder code. Will fix in another PR
for more information, see https://pre-commit.ci
|
Not all the benchmarks are running. Once this is merged I'll fix the rest in #40 . Let me know what you think of this @fluidnumerics-joe |
|
Oh, and since Parcels is now a submodule I think you'll need to do |
|
Is it intentional to not have the FESOM and ICON datasets in the catalog ? I'm confused about where that went. |
|
I should have mentioned in the PR description, I was planning on having it in a future PR (wanted to avoid conflicts with the other reworking of the ingestion code) |
|
Also I need to figure out exactly how intake integrates with Uxarray. The fact Uxarray doesn't initialise from an xarray dataset (ie has uxr.open_mfdataset as the main entry) slightly complicated things |
There was a problem hiding this comment.
What's the difference between the catalogues in parcels-benchmarks and the parcels-examples? They seem to be the same now?
There was a problem hiding this comment.
Yes, to be updated in a future PR (mainly focussing on the actual downloading of the datasets - will fix the catalogs and ingestion at the same time)
- Rafactor path variables and move catalogue definitions to separate file - Since the downloading script now relies on the
Using the `pyproject.toml` file to specify the project dependencies (including the local dependency with Parcels). This helps ensure that dependencies are available in a way available to ASV as well. Maybe this ASV <-> pixi thing needs to be further investigated... ASV's own environment management is something I find confusing
Looks like stripping the environments out of ASV is on the roadmap airspeed-velocity/asv#1581 , which will mean that we can fully manage them with pixi (which would be great)
097ab16 to
b1d3bc5
Compare
for more information, see https://pre-commit.ci
Co-authored-by: Erik van Sebille <erikvansebille@gmail.com>
This PR reworks parcels-benchmarks in a way (I hope) is much easier to work with. Follow the README and let me know what you think.
Changes:
Replaces the
parcels_benchmarksinternal package (which provided the CLI tool for adding dataset hashes etc.). Now instead:intake-xarraycatalog is defined incatalogs/parcels-benchmarks/catalog.yml. The top of the file has a comment which contains the link to the ZIP to be downloaded.scripts/download-catalog.py) downloads the data for acatalogand takes aoutput_dir(both via CLI args). This uses curl to download the dataset, and then unzips all nested zip files (deleting the original zips). This script also copies the catalog file into theoutput_dir(which is good since the datasets in the catalog are defined relative to this catalog file).curlhere means this approach is quite transparent - one can easily see download speeds and decide to cancelsetup-datatask, to download all the datasets.Requires a
PARCELS_BENCHMARKS_DATA_FOLDERenvironment variable to be explicitly set which is then acts as the working space for the data. This environment variable is used in the download and benchmarking code.We needed the following things to ease development:
Download all datasets before running benchmarks
Footnotes
given we are the sole owners of our data sources I don't think this is a concern ↩