Description
We would like to enhance the registry.json to support additional dataset types by adding a prefix for HTTP-based datasets and including support for S3 URLs.
Proposed Changes:
-
Add a Prefix for HTTP Datasets (http_dataset)
- Add a new prefix for HTTP datasets to the
registry.json. This will allow for downloading datasets without relying the census API.
- Example value:
"http_dataset": "https://datasets.cellxgene.cziscience.com/xyz.h5ad"
-
Add Support for S3 Datasets
- Include S3 dataset URLs in the
registry.json for handling datasets stored in S3.
- Example value:
"s3": "s3://bucket_name/path/to/dataset.h5ad"
Why this is needed:
- The
http_dataset prefix will allow for a more structured and modular handling of datasets that are retrieved via HTTP.
- Adding support for S3 datasets ensures that datasets hosted on AWS S3 can be seamlessly integrated into the registry, aligning with current data storage practices.
Additional Context:
- This change will ensure consistent handling across different downloaders (e.g.,
CxGDownloader, HTTPDownloader, S3Downloader in cas-tools) and make it easier to manage datasets in the registry.
cc @dosumis , @hkir-dev
Description
We would like to enhance the
registry.jsonto support additional dataset types by adding a prefix for HTTP-based datasets and including support for S3 URLs.Proposed Changes:
Add a Prefix for HTTP Datasets (
http_dataset)registry.json. This will allow for downloading datasets without relying the census API.Add Support for S3 Datasets
registry.jsonfor handling datasets stored in S3.Why this is needed:
http_datasetprefix will allow for a more structured and modular handling of datasets that are retrieved via HTTP.Additional Context:
CxGDownloader,HTTPDownloader,S3Downloaderincas-tools) and make it easier to manage datasets in the registry.cc @dosumis , @hkir-dev