Skip to content

downingtim/LSDV_pangenomics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

95 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LSDV Pangenome Variation Graph (PVG) Analysis Workflow

License: MIT Pipeline Status Version

Overview

Welcome to the Lumpy Skin Disease Virus (LSDV) pangenomics repository. This project contains a complete bioinformatics workflow designed to perform read mapping, variant calling, and downstream analysis of LSDV sequencing data using both standard linear references and advanced pangenome variation graphs (PVGs).

This repository is organised into several key modules, each contained within its own directory.

Repository Structure

This project is divided into the following main directories, each with a specific purpose:

  • MAPPING_SCRIPTS/

    This is the starting point of the workflow. This directory contains the shell scripts needed to align raw sequencing reads (FASTQ files) and call genetic variants (SNPs and indels). The main script, run_main.sh, can execute workflows using Minimap2, VG Giraffe, and VG-MAP. For full installation, setup, and usage instructions, please refer to the detailed MAPPING_SCRIPTS/README.md.

  • scripts/

    This directory contains a collection of R, Perl, and Python scripts for downstream analysis, which are typically run after the mapping and variant calling is complete. These scripts are used to:

    • Calculate and summarise quality metrics.
    • Analyse SNP characteristics (e.g., heterozygosity, Ti/Tv ratios).
    • Visualise results, such as SNP overlap and coverage.
    • Generate publication-ready figures. For a detailed description of what each script does and how to run it, please see the scripts/README.md.
  • Variant_Effect_Predictor/

    This module contains Python tools to annotate the variants discovered in the mapping step. It analyses VCF files to predict the functional effects of SNPs on viral proteins, classifying them as synonymous, nonsynonymous, or other types of changes. For instructions, please see the Variant_Effect_Predictor/README.md.

  • FIGURES/

    This folder contains the final figures generated by the analysis pipeline. These include plots for diversity, phylogenies, SNP rates, and other key results from the study.

Workflow Overview

For a new user, the analysis proceeds in the following logical order:

  1. Start in MAPPING_SCRIPTS/: Use the run_main.sh script to align your raw sequencing data and generate VCF (Variant Call Format) files. See the README in that folder for prerequisites and detailed commands.
  2. Move to Variant_Effect_Predictor/: Use the vep.py or batch.py scripts to take the VCF files from the previous step and predict the functional impact of the detected variants.
  3. Use scripts/ for Analysis: Run the various R and Perl scripts in this folder to analyse the output from the first two steps. This is where you will generate summary statistics and plots.
  4. Check FIGURES/ for Results: The final, high-quality plots generated by the scripts will be saved in this directory.

Getting Started

The best place to begin is with the read mapping workflow.

  1. Navigate to the MAPPING_SCRIPTS directory.

  2. Read the MAPPING_SCRIPTS/README.md file for full installation and setup instructions.

  3. Once set up, a typical command to start an analysis looks like this:

    # Example: Map a sample using the 3-genome reference with Minimap2
    ./run_main.sh SRR10394925 3 minimap2

Citation

If you use this workflow or its components in your research, please cite this repository. Additionally, please cite the underlying tools used in each part of the pipeline (e.g., VG, Minimap2, FreeBayes), as detailed in the README files of the respective sub-directories.

Caroline Wright, Chandana Tennakoon, Lidia Dykes, Tim Downing. Using pangenome variation graphs to improve mutation detection in a large DNA virus. 2025. https://github.com/downingtim/LSDV_pangenomics/

License

This project is licensed under the MIT License.

About

LSDV pangenome variation graph analysis, read mapping & interpretation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors