Release v2.1.0 (#24)

leahkemp · web-flow · commit 1a6f3d02f73d · 2022-04-20T01:12:56.000+12:00
* Add rule and sample info to slurm log files * Get naming convention of log output dir consistant * Add pipeline tests (#14) * Add test dataset * Add info on test dataset * Automate setting up to run test dataset * Speedups (#15) * Use full parabricks germline pipeline + other standalone tools - provides speedups * Thread parabricks rules * Remove snakemake wrapper and thread fastqc * Add fastqc conda env now that snakemake wrapper has been removed * Fixes * fix error due to file target that isn't created * forgot to add parabricks rule to local rule list * remove flag that causes error * allow dynamic inclusion of recal resources - also stop need for user … (#17) * allow dynamic inclusion of recal resources - also stop need for user to manually write the flags * clarify you can directly pass adapters to trim galore * move existing helpers functions to one place in snakefile * simplify flags for WES settings * account for when someone doesn't use WES settings * Simplify code (#19) * Functionize code (#20) Move dynamic stuff (like if-else statements) into functions to avoid having global variables * Docs (#22) * separate docs for running in different situations * add images * fix links to images * add section about getting data on nesi * remove incomplete docs for running pipeline on NeSi for now * fix fastqc/multiqc error * improve documentation * fix file path that makes download from google cloud bucket not work * improve docs * discourage using home dir in docs * add more information about pipeline * fix sample wildcard error for rules without sample wildcard * clarify output files * add link to discussions * remove g.vcf that causes error in vcf_annotation_pipeline (#25)
diff --git a/README.md b/README.md
@@ -72,7 +72,7 @@ Cohort samples:
 - `results/mapped/sample1_recalibrated.bam`
 - `results/mapped/sample2_recalibrated.bam`
 - `results/mapped/sample3_recalibrated.bam`
-- `results/called/proband1_raw_snps_indels.g.vcf`
+- `results/called/proband1_raw_snps_indels.vcf`
 
 ## Prerequisites
 
@@ -94,6 +94,7 @@ See the docs for a walkthrough guide for running [human_genomics_pipeline](https
 
 - Raise issues in [the issues page](https://github.com/ESR-NZ/human_genomics_pipeline/issues)
 - Create feature requests in [the issues page](https://github.com/ESR-NZ/human_genomics_pipeline/issues)
+- Start a discussion in [the discussion page](https://github.com/ESR-NZ/human_genomics_pipeline/discussions)
 - Contribute your code! Create your own branch from the [development branch](https://github.com/ESR-NZ/human_genomics_pipeline/tree/dev) and create a pull request to the [development branch](https://github.com/ESR-NZ/human_genomics_pipeline/tree/dev) once the code is on point!
 
 Contributions and feedback are always welcome! :blush:
diff --git a/docs/running_on_a_hpc.md b/docs/running_on_a_hpc.md
@@ -209,7 +209,7 @@ Set the maximum number of GPU's to be used per rule/sample for gpu-accelerated r
 GPU: 1
 ```
 
-It is a good idea to consider the number of samples that you are processing. For example, if you set `THREADS: "8"` and set the maximum number of cores to be used by the pipeline in the run script to `-j 32` (see step 6), a maximum of 3 samples will be able to run at one time for these rules (if they are deployed at the same time), but each sample will complete faster. In contrast, if you set `THREADS: "1"` and `-j 32`, a maximum of 32 samples could be run at one time, but each sample will take longer to complete. This also needs to be considered when setting `MAXMEMORY` + `--resources mem_mb` and `GPU` + `--resources gpu`.
+It is a good idea to consider the number of samples that you are processing. For example, if you set `THREADS: "8"` and set the maximum number of cores to be used by the pipeline in the run script to `-j/--cores 32` (see [step 8](#8-modify-the-run-scripts)), a maximum of 3 samples will be able to run at one time for these rules (if they are deployed at the same time), but each sample will complete faster. In contrast, if you set `THREADS: "1"` and `-j/--cores 32`, a maximum of 32 samples could be run at one time, but each sample will take longer to complete. This also needs to be considered when setting `MAXMEMORY` + `--resources mem_mb` and `GPU` + `--resources gpu`.
 
 #### Trimming
 
diff --git a/docs/running_on_a_single_machine.md b/docs/running_on_a_single_machine.md
@@ -208,7 +208,7 @@ Set the maximum number of GPU's to be used per rule/sample for gpu-accelerated r
 GPU: 1
 ```
 
-It is a good idea to consider the number of samples that you are processing. For example, if you set `THREADS: "8"` and set the maximum number of cores to be used by the pipeline in the run script to `-j 32` (see step 6), a maximum of 3 samples will be able to run at one time for these rules (if they are deployed at the same time), but each sample will complete faster. In contrast, if you set `THREADS: "1"` and `-j 32`, a maximum of 32 samples could be run at one time, but each sample will take longer to complete. This also needs to be considered when setting `MAXMEMORY` + `--resources mem_mb` and `GPU` + `--resources gpu`.
+It is a good idea to consider the number of samples that you are processing. For example, if you set `THREADS: "8"` and set the maximum number of cores to be used by the pipeline in the run script to `-j/--cores 32` (see [step 7](#7-modify-the-run-scripts)), a maximum of 3 samples will be able to run at one time for these rules (if they are deployed at the same time), but each sample will complete faster. In contrast, if you set `THREADS: "1"` and `-j/--cores 32`, a maximum of 32 samples could be run at one time, but each sample will take longer to complete. This also needs to be considered when setting `MAXMEMORY` + `--resources mem_mb` and `GPU` + `--resources gpu`.
 
 #### Trimming
 
diff --git a/workflow/Snakefile b/workflow/Snakefile
@@ -163,7 +163,7 @@ if config['DATA'] == "Cohort" or config['DATA'] == 'cohort':
         input:
             "../results/qc/multiqc_report.html",
             expand("../results/mapped/{sample}_recalibrated.bam", sample = SAMPLES),
-            expand("../results/called/{family}_raw_snps_indels.g.vcf", family = FAMILIES)
+            expand("../results/called/{family}_raw_snps_indels.vcf", family = FAMILIES)
 
 ##### Load rules #####
 
diff --git a/workflow/rules/gatk_GenotypeGVCFs.smk b/workflow/rules/gatk_GenotypeGVCFs.smk
@@ -3,7 +3,7 @@ rule gatk_GenotypeGVCFs:
         gvcf = "../results/called/{family}_raw_snps_indels_tmp_combined.g.vcf",
         refgenome = expand("{refgenome}", refgenome = config['REFGENOME'])
     output:
-        protected("../results/called/{family}_raw_snps_indels.g.vcf")
+        protected("../results/called/{family}_raw_snps_indels.vcf")
     params:
         maxmemory = expand('"-Xmx{maxmemory}"', maxmemory = config['MAXMEMORY']),
         tdir = config['TEMPDIR'],