Author: Tien Ly
Updated: March 2025
This repository includes scripts and analyses for investigating local adaptations in American pikas as part of my MS Bioinformatics thesis project at San Jose State University. The project is structured into three main components: variant calling, population genomics analyses, and gene ontology (GO) enrichment analysis.
In this section, I develop a variant calling pipeline. Key steps include:
-
Quality Control:
fastqc.sh: Assess the quality of raw sequencing datamultiqc.sh: Summarize quality reports from multiplefastqc.shoutputs
-
Data Trimming:
fastp.sh: Trim low-quality bases and adapters from the readstrimmed-fastqc.sh: Re-assess quality after trimmingtrimmed-multiqc.sh: Summarize the quality of trimmed reads
-
Read Mapping:
bwa-map.sh: Align reads to the reference genome
-
Post-Processing:
bam-trans.sh: Convert SAM files to BAM formatquery-sort.sh: Sort the BAM files by query nameread-groups.sh: Add read group informationmark-dup.sh: Mark duplicate readscoord-sort.sh: Sort the BAM files by coordinates
-
Haplotype Calling:
haplotype.sh: Call haplotypes from the sorted BAM files to identify variant sites
-
Genotyping and Variant Filtering:
genotype.sh: Aggregate per-sample variant data and genotype the samplesfilter-vcf.sh: Apply quality-based filtering to retain high-confidence variants
These scripts can be found in the wgs-pika folder.
- Create
.dictand.faifiles for the reference genomesamtools faidx GCA_014633375.1_OchPri4.0_genomic.fna
In addition to the individual scripts, a Nextflow pipeline is available to streamline the entire variant calling process. You can find it in the wgs-pika-nextflow folder.
Once the variant calling is complete, I perform downstream statistical analyses on the VCF files, including:
- Outlier Detection: Identify genetic loci under selection using pcadapt and BayeScan
- Genotype-Environment Associations: Explore environmental drivers of selection and adaptation using RDA and BayPass
The scripts can be found in the selection-analysis folder.
In this section, I investigate potential functional annotations of genes associated with local adaptation. Scripts for this analysis can be found in the go-analysis folder.