Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

variant calling workflow + testing #371

Open
wants to merge 37 commits into
base: master
Choose a base branch
from
Open

variant calling workflow + testing #371

wants to merge 37 commits into from

Conversation

fridells51
Copy link
Contributor

@fridells51 fridells51 commented Jun 1, 2023

With this pull request, an end-to-end Snakemake variant calling workflow will be added to lcdb-wf. The Snakefile handles references, mapping reads to the genome, QC, and includes a GATK best practices pipeline for germline and somatic variant calling. The workflow supports whole genome sequencing (WGS) and targeted sequencing inputs and returns analysis-ready, annotated VCFs.

Included in this PR is an update to the conda environment to include packages for variant calling. The lcdb-wf docs are also updated to include a comprehensive overview of the workflow as well as detailing several configuration options that the user can interact with in order to tweak the workflow for their analysis needs. The workflow is not organism-specific and the docs detail how to call variants on non-human organisms. References can be provided to the workflow externally, but this PR will also expand the existing references workflow in lcdb-wf to automatically include new reference types necessary for variant calling.

The VCF annotation portion of the workflow supports attaching annotations from databases like dbNSFP using SnpEff.

The workflow will also run MultiQC to aggregate QC checks on input fastq data, variant calling metrics, and annotation summary files.

Test data for variant calling have been generated and are hosted on https://github.com/lcdb/lcdb-wf-variant-calling-test-data. This test data is run on the workflow using circle ci to test conda environments and workflow execution when new changes are made to the workflow. This protects against deprecation and introducing bugs into the workflow with future updates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants