diff --git a/README.md b/README.md
index 221d42a3..17a39ad0 100644
--- a/README.md
+++ b/README.md
@@ -36,28 +36,56 @@ should convince readers of the significance and relevance of your task.
``` mermaid
flowchart LR
- file_singlecell("SC Dataset")
+ comp_data_loader_sc[/"SC Data Loader"/]
file_common_singlecell("Raw SC Dataset")
- file_common_spatialdata("Raw Spatial Dataset")
+ comp_data_preprocessor[/"Data preprocessor"/]
+ file_singlecell("SC Dataset")
file_spatialdata("Spatial Dataset")
- comp_data_loader_sc[/"SC Data Loader"/]
+ file_common_spatialdata("Raw Spatial Dataset")
comp_data_loader_sp[/"iST Data Loader"/]
- comp_data_preprocessor[/"Data preprocessor"/]
comp_data_loader_sc-->file_common_singlecell
+ file_common_singlecell---comp_data_preprocessor
+ comp_data_preprocessor-->file_singlecell
+ comp_data_preprocessor-->file_spatialdata
+ file_common_spatialdata---comp_data_preprocessor
comp_data_loader_sp-->file_common_spatialdata
```
-## File format: SC Dataset
+## Component type: SC Data Loader
-A single-cell reference dataset, preprocessed for this benchmark.
+A component to download and store single-cell data.
+
+Arguments:
+
+
+
+| Name | Type | Description |
+|:---|:---|:---|
+| `--output` | `file` | (*Output*) An unprocessed dataset as output by a dataset loader. |
+| `--dataset_id` | `string` | NA. |
+| `--dataset_name` | `string` | NA. |
+| `--dataset_url` | `string` | (*Optional*) NA. |
+| `--dataset_reference` | `string` | (*Optional*) NA. |
+| `--dataset_summary` | `string` | NA. |
+| `--dataset_description` | `string` | NA. |
+| `--dataset_organism` | `string` | (*Optional*) NA. |
+
+
+
+## File format: Raw SC Dataset
+
+An unprocessed dataset as output by a dataset loader.
Example file:
-`resources_test/preprocessing_imagingbased_st/2023_yao_mouse_brain_scrnaseq_10xv2/dataset.h5ad`
+`resources_test/common/2023_yao_mouse_brain_scrnaseq_10xv2/dataset.h5ad`
Description:
-This dataset contains preprocessed counts and metadata for single-cell
-RNA-seq data.
+This dataset contains raw counts and metadata as output by a dataset
+loader.
+
+The format of this file is mainly derived from the [CELLxGENE schema
+v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).
Format:
@@ -118,20 +146,34 @@ Data structure:
-## File format: Raw SC Dataset
+## Component type: Data preprocessor
-An unprocessed dataset as output by a dataset loader.
+Preprocess a common dataset for the benchmark.
+
+Arguments:
+
+
+
+| Name | Type | Description |
+|:---|:---|:---|
+| `--input_sp` | `file` | An unprocessed spatial imaging dataset stored as a zarr file. |
+| `--input_sc` | `file` | An unprocessed dataset as output by a dataset loader. |
+| `--output_sp` | `file` | (*Output*) A spatial transcriptomics dataset, preprocessed for this benchmark. |
+| `--output_sc` | `file` | (*Output*) A single-cell reference dataset, preprocessed for this benchmark. |
+
+
+
+## File format: SC Dataset
+
+A single-cell reference dataset, preprocessed for this benchmark.
Example file:
-`resources_test/common/2023_yao_mouse_brain_scrnaseq_10xv2/dataset.h5ad`
+`resources_test/preprocessing_imagingbased_st/2023_yao_mouse_brain_scrnaseq_10xv2/dataset.h5ad`
Description:
-This dataset contains raw counts and metadata as output by a dataset
-loader.
-
-The format of this file is mainly derived from the [CELLxGENE schema
-v4.0.0](https://github.com/chanzuckerberg/single-cell-curation/blob/main/schema/4.0.0/schema.md).
+This dataset contains preprocessed counts and metadata for single-cell
+RNA-seq data.
Format:
@@ -192,17 +234,17 @@ Data structure:
-## File format: Raw Spatial Dataset
+## File format: Spatial Dataset
-An unprocessed spatial imaging dataset stored as a zarr file.
+A spatial transcriptomics dataset, preprocessed for this benchmark.
Example file:
-`resources_test/common/2023_10x_mouse_brain_xenium/dataset.zarr`
+`resources_test/preprocessing_imagingbased_st/2023_10x_mouse_brain_xenium/dataset.zarr`
Description:
-This dataset contains raw images, labels, points, shapes, and tables as
-output by a dataset loader.
+This dataset contains preprocessed images, labels, points, shapes, and
+tables for spatial transcriptomics data.
Format:
@@ -216,17 +258,17 @@ Data structure:
-## File format: Spatial Dataset
+## File format: Raw Spatial Dataset
-A spatial transcriptomics dataset, preprocessed for this benchmark.
+An unprocessed spatial imaging dataset stored as a zarr file.
Example file:
-`resources_test/preprocessing_imagingbased_st/2023_10x_mouse_brain_xenium/dataset.zarr`
+`resources_test/common/2023_10x_mouse_brain_xenium/dataset.zarr`
Description:
-This dataset contains preprocessed images, labels, points, shapes, and
-tables for spatial transcriptomics data.
+This dataset contains raw images, labels, points, shapes, and tables as
+output by a dataset loader.
Format:
@@ -240,27 +282,6 @@ Data structure:
-## Component type: SC Data Loader
-
-A component to download and store single-cell data.
-
-Arguments:
-
-
-
-| Name | Type | Description |
-|:---|:---|:---|
-| `--output` | `file` | (*Output*) An unprocessed dataset as output by a dataset loader. |
-| `--dataset_id` | `string` | NA. |
-| `--dataset_name` | `string` | NA. |
-| `--dataset_url` | `string` | (*Optional*) NA. |
-| `--dataset_reference` | `string` | (*Optional*) NA. |
-| `--dataset_summary` | `string` | NA. |
-| `--dataset_description` | `string` | NA. |
-| `--dataset_organism` | `string` | (*Optional*) NA. |
-
-
-
## Component type: iST Data Loader
A component to download and store iST data.
@@ -282,13 +303,3 @@ Arguments:
-## Component type: Data preprocessor
-
-Preprocess a common dataset for the benchmark.
-
-Arguments:
-
-
-
-
-
diff --git a/src/api/comp_data_preprocessor.yaml b/src/api/comp_data_preprocessor.yaml
index 22abb60f..34ad6c41 100644
--- a/src/api/comp_data_preprocessor.yaml
+++ b/src/api/comp_data_preprocessor.yaml
@@ -6,7 +6,7 @@ info:
description: |
This component processes a common single-cell and a common spatial transcriptomics
dataset for the benchmark.
- arguments:
+arguments:
- name: "--input_sp"
__merge__: file_common_spatialdata.yaml
direction: input