Welcome to the Lumpy Skin Disease Virus (LSDV) pangenomics repository. This project contains a complete bioinformatics workflow designed to perform read mapping, variant calling, and downstream analysis of LSDV sequencing data using both standard linear references and advanced pangenome variation graphs (PVGs).
This repository is organised into several key modules, each contained within its own directory.
This project is divided into the following main directories, each with a specific purpose:
-
This is the starting point of the workflow. This directory contains the shell scripts needed to align raw sequencing reads (FASTQ files) and call genetic variants (SNPs and indels). The main script,
run_main.sh, can execute workflows using Minimap2, VG Giraffe, and VG-MAP. For full installation, setup, and usage instructions, please refer to the detailed MAPPING_SCRIPTS/README.md. -
This directory contains a collection of R, Perl, and Python scripts for downstream analysis, which are typically run after the mapping and variant calling is complete. These scripts are used to:
- Calculate and summarise quality metrics.
- Analyse SNP characteristics (e.g., heterozygosity, Ti/Tv ratios).
- Visualise results, such as SNP overlap and coverage.
- Generate publication-ready figures. For a detailed description of what each script does and how to run it, please see the scripts/README.md.
-
This module contains Python tools to annotate the variants discovered in the mapping step. It analyses VCF files to predict the functional effects of SNPs on viral proteins, classifying them as synonymous, nonsynonymous, or other types of changes. For instructions, please see the Variant_Effect_Predictor/README.md.
-
This folder contains the final figures generated by the analysis pipeline. These include plots for diversity, phylogenies, SNP rates, and other key results from the study.
For a new user, the analysis proceeds in the following logical order:
- Start in
MAPPING_SCRIPTS/: Use therun_main.shscript to align your raw sequencing data and generate VCF (Variant Call Format) files. See the README in that folder for prerequisites and detailed commands. - Move to
Variant_Effect_Predictor/: Use thevep.pyorbatch.pyscripts to take the VCF files from the previous step and predict the functional impact of the detected variants. - Use
scripts/for Analysis: Run the various R and Perl scripts in this folder to analyse the output from the first two steps. This is where you will generate summary statistics and plots. - Check
FIGURES/for Results: The final, high-quality plots generated by the scripts will be saved in this directory.
The best place to begin is with the read mapping workflow.
-
Navigate to the
MAPPING_SCRIPTSdirectory. -
Read the
MAPPING_SCRIPTS/README.mdfile for full installation and setup instructions. -
Once set up, a typical command to start an analysis looks like this:
# Example: Map a sample using the 3-genome reference with Minimap2 ./run_main.sh SRR10394925 3 minimap2
If you use this workflow or its components in your research, please cite this repository. Additionally, please cite the underlying tools used in each part of the pipeline (e.g., VG, Minimap2, FreeBayes), as detailed in the README files of the respective sub-directories.
Caroline Wright, Chandana Tennakoon, Lidia Dykes, Tim Downing. Using pangenome variation graphs to improve mutation detection in a large DNA virus. 2025. https://github.com/downingtim/LSDV_pangenomics/
This project is licensed under the MIT License.