Skip to content

Principle Component Analysis

ChaochihL edited this page Sep 28, 2018 · 5 revisions

This method performs a principle component analysis (PCA) using ANGSD and ngsPopGen for PCA calculation. Please see NGSPopGen for full details on this method.

Basic Usage

To run this method, use the following command

angsd-wrapper PCA Principal_Component_Analysis_Config

where Principal_Component_Analysis_Config is the full path to the configuration file for the PCA.

Input files

All inputs should be specified in Principal_Component_Analysis_Config.

Common Variables

This method does make use of Common_Config, those that are used are listed below:

Variable Function
SAMPLE_LIST
GROUP_SAMPLES on dev
A list of samples to be used in calculations
PROJECT Name given to all outputs in ANGSD-wrapper
SCRATCH Place to store files, the full path is SCRATCH/PROJECT/PCA
REGIONS Limit the scope of ANGSD-wrapper to certain regions

Method-Specific Variables

This method has no method-specifc variables

Method Parameters

The parameters for this method can be tweaked as necessary, they have been set for optimal generalized function:

Parameter Function
DO_MAF Calculate per-site frequencies
DO_MAJORMINOR Estimate major/minor alleles
DO_GENO Call genotypes and setup the output
DO_POST Calculate the posterior probability using per-site frequencies
N_CORES Number of cores to use, please do not set above the limits of your system
CALL Call genotype from maximum probability
GT_LIKELIHOOD Estimates genotype likelihoods
N_SITES Set the maximum number of sites to use

Output files

Naming Scheme Contents
PROJECT_PCA.arg Details of arguments
PROJECT_PCA.covar Results of the principle component analysis
PROJECT_PCA.geno Genotype calls
PROJECT_PCA.mafs.gz Per-site frequencies

Visualization

PROJECT_PCA.covar can be visualized with the Shiny graphing interface. A web browser with a graphical user interface is required.