Skip to content

Implement dataverse-import-files #292

@mih

Description

@mih

Dataverse provides full dataset (version) file listings that also include md5 sums (and others). Therefore it would be fairly simple to support sucking in a filetree without having to go through the full complexity of support git-annex's importtree. datalad-ebrains pretty much has the blueprint for that.

It is unclear to me whether such a starting point could be coupled with an export/filetree-only setup provided by

datalad add-sibling-dataverse --mode filetree-only URL PID

but the immediate answer is no. Git-annex refuses to try, because it has no export location on record.

Faking an (or performing an empty) export also does not work, base a datalad dataset will contain files that are not on dataverse (and possibly cannot be, ie. the importing agent has no write permissions).


A different approach would be to populate a dataset with keys that have attached URLs that point to the data access API of the respective dataverse instance. The uncurl special remote would then be able to take care of them. Possibly a dedicated handler needs to be implemented that performs the auth correctly. Such a handler can be configured in the dataset and for the specific dataverse instance specifically.

Here is a sketch

git annex initremote uncurl type=external externaltype=uncurl encryption=none

git annex registerurl SHA256E-s26309--6ba60e2f73d403beecd5e50afa8affa824e21150558f0b333e209dc4427604c8.tsv https://data.fz-juelich.de/api/access/datafile/2694

git annex fromkey SHA256E-s26309--6ba60e2f73d403beecd5e50afa8affa824e21150558f0b333e209dc4427604c8.tsv sub-042/eeg/sub-042_task-extstim_events.tsv --force

For public datasets (no auth), uncurl is not even needed. web does things alright.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions