Skip to content

Add http_dataset Prefix and s3 Support to registry.json #135

@ubyndr

Description

@ubyndr

Description

We would like to enhance the registry.json to support additional dataset types by adding a prefix for HTTP-based datasets and including support for S3 URLs.

Proposed Changes:

  1. Add a Prefix for HTTP Datasets (http_dataset)

    • Add a new prefix for HTTP datasets to the registry.json. This will allow for downloading datasets without relying the census API.
    • Example value:
      "http_dataset": "https://datasets.cellxgene.cziscience.com/xyz.h5ad"
  2. Add Support for S3 Datasets

    • Include S3 dataset URLs in the registry.json for handling datasets stored in S3.
    • Example value:
      "s3": "s3://bucket_name/path/to/dataset.h5ad"

Why this is needed:

  • The http_dataset prefix will allow for a more structured and modular handling of datasets that are retrieved via HTTP.
  • Adding support for S3 datasets ensures that datasets hosted on AWS S3 can be seamlessly integrated into the registry, aligning with current data storage practices.

Additional Context:

  • This change will ensure consistent handling across different downloaders (e.g., CxGDownloader, HTTPDownloader, S3Downloader in cas-tools) and make it easier to manage datasets in the registry.

cc @dosumis , @hkir-dev

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions