Skip to content

Commit

Permalink
fixes and instructions
Browse files Browse the repository at this point in the history
  • Loading branch information
bio-la committed Feb 23, 2024
1 parent 17c26f6 commit 7f4321d
Show file tree
Hide file tree
Showing 2 changed files with 129 additions and 28 deletions.
71 changes: 43 additions & 28 deletions docs/yaml_docs/pipeline_integration_yml.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ the workflow.
### Data format


<span class="parameter">sample_prefix</span> `String`, Default: test<br>
<span class="parameter">sample_prefix</span> `String`, Mandatory parameter, Default: test<br>
Prefix for the sample that comes out of the filtering/ preprocessing steps of the workflow.

<span class="parameter">preprocessed_obj</span> `String`, Mandatory parameter<br>
Expand All @@ -64,19 +64,20 @@ Prefix for the sample that comes out of the filtering/ preprocessing steps of th
## RNA modality

<span class="parameter">rna:</span>
Batch correction for the rna modality is specified by the following parameters:
Batch correction for the RNA modality is specified by the following parameters:


- <span class="parameter">run</span> `Boolean`, Default: True<br>
Defines if you want the batch correction to run
Defines if you want the batch correction to run. If set to `False`, `PCA` with default parameters is calculated.

- <span class="parameter">tools</span> `String` (comma-separated), Default: harmony,bbknn,scanorama,scvi<br>
Defines the method used to run batch correction, multiple can be selected.
choices: harmony, bbknn, scanorama, scvi
- <span class="parameter">tools</span> `String` (comma-separated), Default: `harmony,bbknn,scanorama,scvi`<br>
Defines the method used to run batch correction, multiple can be selected and run simultaneously.

Choices: `harmony`, `bbknn`, `scanorama`, `scvi`

- <span class="parameter">column</span> `String` (comma-separated), Default: sample_id<br>

The column you want to batch correct on, if a comma-separated list is specified then all will be used simultaneously
The column name of the covariate you want want to batch correct on, if a comma-separated list is specified then all will be used simultaneously.

### Harmony arguments

Expand All @@ -87,12 +88,15 @@ Prefix for the sample that comes out of the filtering/ preprocessing steps of th
- <span class="parameter">theta</span> `Float`, Default: 1.0<br>
- <span class="parameter">npcs</span> `Integer`, Default: 30<br>

For more information on harmony check https://portals.broadinstitute.org/harmony/reference/RunHarmony.html
For more information on `harmony` check the [harmony documentation](https://portals.broadinstitute.org/harmony/reference/RunHarmony.html)

### BBKNN arguments
Check https://bbknn.readthedocs.io/en/latest/ for more information

- <span class="parameter">bbknn:</span>
- <span class="parameter">neighbors_within_batch:</span> `Integer`, Default: 3<br>

For more information on `bbknn` check the [bbknn documentation](https://bbknn.readthedocs.io/en/latest/)

### SCVI arguments
- <span class="parameter">scvi</span>: SCVI parameters are specified as
- <span class="parameter">exclude_mt_genes:</span> `Boolean`, Default: True<br>
Expand Down Expand Up @@ -127,10 +131,12 @@ Check https://bbknn.readthedocs.io/en/latest/ for more information
- <span class="parameter">lr_factor</span> `Float`, Default: 0.1<br>
For more information check https://docs.scvi-tools.org/en/stable/api/reference/scvi.model.SCVI.html
For more information on `scvi` check the [scvi documentation](https://docs.scvi-tools.org/en/stable/api/reference/scvi.model.SCVI.html)

### Find neighbour parameters
- <span class="parameter">neighbors:</span> `String`, Default: &rna_neighbors<br>
Parameters to compute the connectivity graph on RNA

- <span class="parameter">neighbors:</span> `String`<br>

- <span class="parameter">npcs</span> `Integer`, Default: 30<br>
Number of principal components to calculate for neighbors and Umap
Expand All @@ -141,7 +147,7 @@ Check https://bbknn.readthedocs.io/en/latest/ for more information
- <span class="parameter">metric</span> `String`, Default: euclidean<br>
Metric can be either euclidean or cosine

- <span class="parameter">methof</span> `String`, Default: scanpy<br>
- <span class="parameter">method</span> `String`, Default: scanpy<br>
The method can either be scanpy or hnsw


Expand All @@ -151,7 +157,7 @@ Check https://bbknn.readthedocs.io/en/latest/ for more information


- <span class="parameter">run</span> `Boolean`, Default: True<br>
Defines if you want the batch correction to run
Defines if you want the batch correction to run on the Protein modality.If set to `False`, `PCA` with default parameters is calculated.

- <span class="parameter">tools</span> `String` (comma-separated), Default: harmony<br>
Defines the method used to run batch correction, multiple can be selected.
Expand All @@ -170,15 +176,21 @@ Check https://bbknn.readthedocs.io/en/latest/ for more information
- <span class="parameter">theta</span> `Float`, Default: 1.0<br>
- <span class="parameter">npcs</span> `Integer`, Default: 30<br>

For more information on harmony check https://portals.broadinstitute.org/harmony/reference/RunHarmony.html
For more information on `harmony` check the [harmony documentation](https://portals.broadinstitute.org/harmony/reference/RunHarmony.html)


### BBKNN arguments

Check https://bbknn.readthedocs.io/en/latest/ for more information

- <span class="parameter">bbknn:</span>
- <span class="parameter">neighbors_within_batch:</span> `Integer`, Default: 3<br>

For more information on `bbknn` check the [bbknn documentation](https://bbknn.readthedocs.io/en/latest/)

### Find neighbour parameters

Parameters to compute the connectivity graph on Protein

- <span class="parameter">neighbors:</span> `String`, Default: &prot_neighbors<br>

- <span class="parameter">npcs</span> `Integer`, Default: 30<br>
Expand All @@ -197,13 +209,13 @@ Check https://bbknn.readthedocs.io/en/latest/ for more information
## ATAC modality

<span class="parameter">atac:</span>
Batch correction for the protein modality is specified by the following parameters:
Batch correction for the ATAC modality is specified by the following parameters:

- <span class="parameter">run</span> `Boolean`, Default: False<br>
Defines if you want the batch correction to run
Defines if you want the batch correction to run. If set to `False`, `PCA` with default parameters is calculated.

- <span class="parameter">dimred</span> `String`, Default: PCA<br>
Defines if you which dimensionality reduction to use, PCA or LSI
Defines if which dimensionality reduction to use, PCA or LSI

- <span class="parameter">tools</span> `String` (comma-separated), Default: harmony<br>
Defines the method used to run batch correction, multiple can be selected.
Expand All @@ -222,20 +234,23 @@ Check https://bbknn.readthedocs.io/en/latest/ for more information
- <span class="parameter">theta</span> `Float`, Default: 1.0<br>
- <span class="parameter">npcs</span> `Integer`, Default: 30<br>

For more information on harmony check https://portals.broadinstitute.org/harmony/reference/RunHarmony.html
For more information on `harmony` check the [harmony documentation](https://portals.broadinstitute.org/harmony/reference/RunHarmony.html)


### BBKNN arguments

Check https://bbknn.readthedocs.io/en/latest/ for more information


- <span class="parameter">bbknn:</span>

- <span class="parameter">neighbors_within_batch:</span> `Integer`, Default: 3<br>

For more information on `bbknn` check the [bbknn documentation](https://bbknn.readthedocs.io/en/latest/)


### Find neighbour parameters

- <span class="parameter">neighbors:</span> `String`, Default: &atac_neighbors<br>
- <span class="parameter">neighbors:</span> `String` <br>

- <span class="parameter">npcs</span> `Integer`, Default: 30<br>
Number of principal components to calculate for neighbors and Umap
Expand All @@ -246,7 +261,7 @@ Check https://bbknn.readthedocs.io/en/latest/ for more information
- <span class="parameter">metric</span> `String`, Default: euclidean<br>
Metric can be either euclidean or cosine

- <span class="parameter">methof</span> `String`, Default: scanpy<br>
- <span class="parameter">method</span> `String`, Default: scanpy<br>
The method can either be scanpy or hnsw


Expand All @@ -257,7 +272,7 @@ Check https://bbknn.readthedocs.io/en/latest/ for more information
Leave False if you don't want to run multimodal integration

- <span class="parameter">tools</span> `String`(Comma separated), Default: "WNN"<br>
Method you want to use to run batch correction. Options include: WNN, totalvi and multiVI. You can specify mutiple.
Method you want to use to run batch correction. Options include: WNN, totalvi and multiVI. You can specify mutiple methods and they will be run simultaneously.

- <span class="parameter">column_categorical</span> `String`(Comma separated), Default: sample_id<br>
This is the column you want to run a batch correction on, multiple can be selected simultaneously.
Expand Down Expand Up @@ -292,11 +307,11 @@ Check https://bbknn.readthedocs.io/en/latest/ for more information

### MultiVI arguments

**totalvi has to run on both rna and atac data**
**MultiVI has to run on both rna and atac data**

These are the basic multivi parameters required, you can add more if it fits your analysis better.

By setting lowmen to True it will subset the atac to the top 25k HVF which is recommended to deal with concatenation of atac,rna on large datasets which at the moment is suboptimally required by scvitool. Note that >100GB of RAM are required to concatenate atac,rna with 15k cells and 120k total features (union rna,atac)
By setting lowmen to True it will subset the atac to the top 25k HVF which is recommended to deal with the concatenation of atac and rna on large datasets which at the moment is required by `scvi-tools`. Note that >100GB of RAM are required to concatenate atac,rna with 15k cells and 120k total features (union rna,atac)

- <span class="parameter">MultiVI:</span>

Expand Down Expand Up @@ -332,7 +347,7 @@ Check https://bbknn.readthedocs.io/en/latest/ for more information

### Mofa

**Requires at least two modalities, however can run with all three**
**Requires at least two modalities, can run with three**

These are the basic mofa parameters required, you can add more if it fits your analysis better.

Expand All @@ -349,7 +364,7 @@ Check https://bbknn.readthedocs.io/en/latest/ for more information

### WNN

**Requires at least two modalities, however can run with all three**
**Requires at least two modalities, can run with three**

These are the basic WNN parameters required, you can add more if it fits your analysis better.

Expand Down Expand Up @@ -402,7 +417,7 @@ Grouping must be a categorical variable
- <span class="parameter">grouping_var</span> `String`, Default: sample_id<br>
- <span class="parameter">all</span> `String`, Default: rep:receptor_subtype<br>

Any metrics you may want to plot on all modality umaps should go under all
Any metrics you may want to plot on all modality umaps should be listed under all the modalities

- <span class="parameter">rna</span> `String`, Default: rna:total_counts<br>
- <span class="parameter">prot</span> `String`, Default: prot:total_counts<br>
Expand Down
86 changes: 86 additions & 0 deletions docs/yaml_docs/useful_info_on_yml.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# Useful information on formatting and reading yaml files

`panpipes` workflows can be executed by specifying their parameters into their configuration files. Each workflow has its own configuration file with is generated by `panpipes NAME_OF_WORKFLOW config`

The configuration files are YAML files, which are the backbone of the system's setup and operation, defining parameters, settings, and structures in a human-readable format. These files allow for easy modification and sharing of configurations, promoting a clear and efficient way to manage the `panpipes` behavior and its various components.


If you're not familiar with the format please check out these useful links:

- [YAML basics](https://www.tutorialspoint.com/yaml/yaml_basics.htm)
- [YAML beginner's guide](https://www.redhat.com/sysadmin/yaml-beginners)
- [Reading and writing YAML with Python](https://python.land/data-processing/python-yaml#What_is_YAML)


## How we use YAML files to configure panpipes actions


`panpipes` reads the whole `pipeline.yml` as `PARAMS` at the beginning of each pipeline execution:

```python
from cgatcore import pipeline as P

PARAMS = P.get_parameters(
["%s/pipeline.yml" % os.path.splitext(__file__)[0],
"pipeline.yml"])

```

### 1. Understanding indentations

YAML works with mapping blocks, which are started and closed by a new indentation level.
Therefore, the indentations in the files are essential for the pipeline to understand which are the blocks that it needs to parse correctly.

Here is an example of a mapping block from an excerpt of the `integration` `pipeline.yml` file.

```yaml
prot:

run: True
tools: harmony
column: sample_id
#----------------------------
# Harmony args
#-----------------------------
harmony:
# sigma value, used by Harmony
sigma: 0.1
# theta value used by Harmony, default is 1
theta: 1.0
# number of pcs, used by Harmony
npcs: 30
#----------------------------
# BBKNN args # https://bbknn.readthedocs.io/en/latest/
#-----------------------------
bbknn:
neighbors_within_batch:
#----------------------------›
# find neighbour parameters
#-----------------------------
neighbors: &prot_neighbors
# number of Principal Components to calculate for neighbours and umap:
# -if no correction is applied, PCA will be calculated and used to run UMAP and clustering on
# -if Harmony is the method of choice, it will use these components to create a corrected dim red.)
# note: scvelo default is 30
npcs: 30
# number of neighbours
k: 30
# metric: euclidean | cosine
metric: euclidean
# scanpy | hnsw (from scvelo)
method: scanpy
```
`panpipes` reads the whole `pipeline.yml` as `PARAMS` and these parameters as `prot_params = PARAMS['prot']`








to reuse these params, (for example for WNN) please use anchors (&) and scalars (*) in the relevant place

i.e. &rna_neighbors will be called by *rna_neighbors where referenced
neighbors: &rna_neighbors

0 comments on commit 7f4321d

Please sign in to comment.