diff --git a/README.md b/README.md index 221d42a3..17a39ad0 100644 --- a/README.md +++ b/README.md @@ -36,28 +36,56 @@ should convince readers of the significance and relevance of your task. ``` mermaid flowchart LR - file_singlecell("SC Dataset") + comp_data_loader_sc[/"SC Data Loader"/] file_common_singlecell("Raw SC Dataset") - file_common_spatialdata("Raw Spatial Dataset") + comp_data_preprocessor[/"Data preprocessor"/] + file_singlecell("SC Dataset") file_spatialdata("Spatial Dataset") - comp_data_loader_sc[/"SC Data Loader"/] + file_common_spatialdata("Raw Spatial Dataset") comp_data_loader_sp[/"iST Data Loader"/] - comp_data_preprocessor[/"Data preprocessor"/] comp_data_loader_sc-->file_common_singlecell + file_common_singlecell---comp_data_preprocessor + comp_data_preprocessor-->file_singlecell + comp_data_preprocessor-->file_spatialdata + file_common_spatialdata---comp_data_preprocessor comp_data_loader_sp-->file_common_spatialdata ``` -## File format: SC Dataset +## Component type: SC Data Loader -A single-cell reference dataset, preprocessed for this benchmark. +A component to download and store single-cell data. + +Arguments: + +
+ +| Name | Type | Description | +|:---|:---|:---| +| `--output` | `file` | (*Output*) An unprocessed dataset as output by a dataset loader. | +| `--dataset_id` | `string` | NA. | +| `--dataset_name` | `string` | NA. | +| `--dataset_url` | `string` | (*Optional*) NA. | +| `--dataset_reference` | `string` | (*Optional*) NA. | +| `--dataset_summary` | `string` | NA. | +| `--dataset_description` | `string` | NA. | +| `--dataset_organism` | `string` | (*Optional*) NA. | + +
+ +## File format: Raw SC Dataset + +An unprocessed dataset as output by a dataset loader. Example file: -`resources_test/preprocessing_imagingbased_st/2023_yao_mouse_brain_scrnaseq_10xv2/dataset.h5ad` +`resources_test/common/2023_yao_mouse_brain_scrnaseq_10xv2/dataset.h5ad` Description: -This dataset contains preprocessed counts and metadata for single-cell -RNA-seq data. +This dataset contains raw counts and metadata as output by a dataset +loader. + +The format of this file is mainly derived from the [CELLxGENE schema +v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md). Format: @@ -118,20 +146,34 @@ Data structure: -## File format: Raw SC Dataset +## Component type: Data preprocessor -An unprocessed dataset as output by a dataset loader. +Preprocess a common dataset for the benchmark. + +Arguments: + +
+ +| Name | Type | Description | +|:---|:---|:---| +| `--input_sp` | `file` | An unprocessed spatial imaging dataset stored as a zarr file. | +| `--input_sc` | `file` | An unprocessed dataset as output by a dataset loader. | +| `--output_sp` | `file` | (*Output*) A spatial transcriptomics dataset, preprocessed for this benchmark. | +| `--output_sc` | `file` | (*Output*) A single-cell reference dataset, preprocessed for this benchmark. | + +
+ +## File format: SC Dataset + +A single-cell reference dataset, preprocessed for this benchmark. Example file: -`resources_test/common/2023_yao_mouse_brain_scrnaseq_10xv2/dataset.h5ad` +`resources_test/preprocessing_imagingbased_st/2023_yao_mouse_brain_scrnaseq_10xv2/dataset.h5ad` Description: -This dataset contains raw counts and metadata as output by a dataset -loader. - -The format of this file is mainly derived from the [CELLxGENE schema -v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md). +This dataset contains preprocessed counts and metadata for single-cell +RNA-seq data. Format: @@ -192,17 +234,17 @@ Data structure: -## File format: Raw Spatial Dataset +## File format: Spatial Dataset -An unprocessed spatial imaging dataset stored as a zarr file. +A spatial transcriptomics dataset, preprocessed for this benchmark. Example file: -`resources_test/common/2023_10x_mouse_brain_xenium/dataset.zarr` +`resources_test/preprocessing_imagingbased_st/2023_10x_mouse_brain_xenium/dataset.zarr` Description: -This dataset contains raw images, labels, points, shapes, and tables as -output by a dataset loader. +This dataset contains preprocessed images, labels, points, shapes, and +tables for spatial transcriptomics data. Format: @@ -216,17 +258,17 @@ Data structure: -## File format: Spatial Dataset +## File format: Raw Spatial Dataset -A spatial transcriptomics dataset, preprocessed for this benchmark. +An unprocessed spatial imaging dataset stored as a zarr file. Example file: -`resources_test/preprocessing_imagingbased_st/2023_10x_mouse_brain_xenium/dataset.zarr` +`resources_test/common/2023_10x_mouse_brain_xenium/dataset.zarr` Description: -This dataset contains preprocessed images, labels, points, shapes, and -tables for spatial transcriptomics data. +This dataset contains raw images, labels, points, shapes, and tables as +output by a dataset loader. Format: @@ -240,27 +282,6 @@ Data structure: -## Component type: SC Data Loader - -A component to download and store single-cell data. - -Arguments: - -
- -| Name | Type | Description | -|:---|:---|:---| -| `--output` | `file` | (*Output*) An unprocessed dataset as output by a dataset loader. | -| `--dataset_id` | `string` | NA. | -| `--dataset_name` | `string` | NA. | -| `--dataset_url` | `string` | (*Optional*) NA. | -| `--dataset_reference` | `string` | (*Optional*) NA. | -| `--dataset_summary` | `string` | NA. | -| `--dataset_description` | `string` | NA. | -| `--dataset_organism` | `string` | (*Optional*) NA. | - -
- ## Component type: iST Data Loader A component to download and store iST data. @@ -282,13 +303,3 @@ Arguments: -## Component type: Data preprocessor - -Preprocess a common dataset for the benchmark. - -Arguments: - -
- -
- diff --git a/src/api/comp_data_preprocessor.yaml b/src/api/comp_data_preprocessor.yaml index 22abb60f..34ad6c41 100644 --- a/src/api/comp_data_preprocessor.yaml +++ b/src/api/comp_data_preprocessor.yaml @@ -6,7 +6,7 @@ info: description: | This component processes a common single-cell and a common spatial transcriptomics dataset for the benchmark. - arguments: +arguments: - name: "--input_sp" __merge__: file_common_spatialdata.yaml direction: input