-
Notifications
You must be signed in to change notification settings - Fork 14
Description
Dataverse provides full dataset (version) file listings that also include md5 sums (and others). Therefore it would be fairly simple to support sucking in a filetree without having to go through the full complexity of support git-annex's importtree. datalad-ebrains pretty much has the blueprint for that.
It is unclear to me whether such a starting point could be coupled with an export/filetree-only setup provided by
datalad add-sibling-dataverse --mode filetree-only URL PID
but the immediate answer is no. Git-annex refuses to try, because it has no export location on record.
Faking an (or performing an empty) export also does not work, base a datalad dataset will contain files that are not on dataverse (and possibly cannot be, ie. the importing agent has no write permissions).
A different approach would be to populate a dataset with keys that have attached URLs that point to the data access API of the respective dataverse instance. The uncurl special remote would then be able to take care of them. Possibly a dedicated handler needs to be implemented that performs the auth correctly. Such a handler can be configured in the dataset and for the specific dataverse instance specifically.
Here is a sketch
git annex initremote uncurl type=external externaltype=uncurl encryption=none
git annex registerurl SHA256E-s26309--6ba60e2f73d403beecd5e50afa8affa824e21150558f0b333e209dc4427604c8.tsv https://data.fz-juelich.de/api/access/datafile/2694
git annex fromkey SHA256E-s26309--6ba60e2f73d403beecd5e50afa8affa824e21150558f0b333e209dc4427604c8.tsv sub-042/eeg/sub-042_task-extstim_events.tsv --forceFor public datasets (no auth), uncurl is not even needed. web does things alright.