-
Notifications
You must be signed in to change notification settings - Fork 10
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
2 changed files
with
129 additions
and
28 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,86 @@ | ||
# Useful information on formatting and reading yaml files | ||
|
||
`panpipes` workflows can be executed by specifying their parameters into their configuration files. Each workflow has its own configuration file with is generated by `panpipes NAME_OF_WORKFLOW config` | ||
|
||
The configuration files are YAML files, which are the backbone of the system's setup and operation, defining parameters, settings, and structures in a human-readable format. These files allow for easy modification and sharing of configurations, promoting a clear and efficient way to manage the `panpipes` behavior and its various components. | ||
|
||
|
||
If you're not familiar with the format please check out these useful links: | ||
|
||
- [YAML basics](https://www.tutorialspoint.com/yaml/yaml_basics.htm) | ||
- [YAML beginner's guide](https://www.redhat.com/sysadmin/yaml-beginners) | ||
- [Reading and writing YAML with Python](https://python.land/data-processing/python-yaml#What_is_YAML) | ||
|
||
|
||
## How we use YAML files to configure panpipes actions | ||
|
||
|
||
`panpipes` reads the whole `pipeline.yml` as `PARAMS` at the beginning of each pipeline execution: | ||
|
||
```python | ||
from cgatcore import pipeline as P | ||
|
||
PARAMS = P.get_parameters( | ||
["%s/pipeline.yml" % os.path.splitext(__file__)[0], | ||
"pipeline.yml"]) | ||
|
||
``` | ||
|
||
### 1. Understanding indentations | ||
|
||
YAML works with mapping blocks, which are started and closed by a new indentation level. | ||
Therefore, the indentations in the files are essential for the pipeline to understand which are the blocks that it needs to parse correctly. | ||
|
||
Here is an example of a mapping block from an excerpt of the `integration` `pipeline.yml` file. | ||
|
||
```yaml | ||
prot: | ||
|
||
run: True | ||
tools: harmony | ||
column: sample_id | ||
#---------------------------- | ||
# Harmony args | ||
#----------------------------- | ||
harmony: | ||
# sigma value, used by Harmony | ||
sigma: 0.1 | ||
# theta value used by Harmony, default is 1 | ||
theta: 1.0 | ||
# number of pcs, used by Harmony | ||
npcs: 30 | ||
#---------------------------- | ||
# BBKNN args # https://bbknn.readthedocs.io/en/latest/ | ||
#----------------------------- | ||
bbknn: | ||
neighbors_within_batch: | ||
#----------------------------› | ||
# find neighbour parameters | ||
#----------------------------- | ||
neighbors: &prot_neighbors | ||
# number of Principal Components to calculate for neighbours and umap: | ||
# -if no correction is applied, PCA will be calculated and used to run UMAP and clustering on | ||
# -if Harmony is the method of choice, it will use these components to create a corrected dim red.) | ||
# note: scvelo default is 30 | ||
npcs: 30 | ||
# number of neighbours | ||
k: 30 | ||
# metric: euclidean | cosine | ||
metric: euclidean | ||
# scanpy | hnsw (from scvelo) | ||
method: scanpy | ||
``` | ||
`panpipes` reads the whole `pipeline.yml` as `PARAMS` and these parameters as `prot_params = PARAMS['prot']` | ||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
|
||
to reuse these params, (for example for WNN) please use anchors (&) and scalars (*) in the relevant place | ||
|
||
i.e. &rna_neighbors will be called by *rna_neighbors where referenced | ||
neighbors: &rna_neighbors |