Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distribute datasets as artifacts #59

Open
johnnychen94 opened this issue Mar 30, 2021 · 0 comments
Open

Distribute datasets as artifacts #59

johnnychen94 opened this issue Mar 30, 2021 · 0 comments

Comments

@johnnychen94
Copy link
Member

CRef: #57 (comment)

We currently use DataDeps as an interface to download datasets from original websites. While it's good to give a clear license and source, it can be unstable for reproducibility because worldwide users might have difficulties connecting original sites. The original sites might also be offline for various reasons, e.g., #57.

To avoid issues like #57 in the future and accelerate dataset downloading, we could take advantage of Julia's Artifacts system and let Pkg/Storage servers hold and distribute the datasets. MLDatasets don't hold large datasets so it adds little stress to the Julia ecosystem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant