Skip to content

Commit

Permalink
Added test options and documentation; cleaned up code
Browse files Browse the repository at this point in the history
  • Loading branch information
corbinq committed Aug 28, 2019
1 parent e27c83b commit 1644b80
Show file tree
Hide file tree
Showing 10 changed files with 243 additions and 215 deletions.
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
VERSION=0.3

CPPFLAGS= -Ofast -flto -pipe -I$(cdir)/src/htslib -I$(cdir)/src/cdflib -I$(cdir)/src/tabixpp -L$(cdir)/src/htslib -L$(cdir)/src/cdflib -L$(cdir)/src/tabixpp
CXXFLAGS= -std=c++11 -DNDEBUG
CXXFLAGS= -std=c++11
FFLAGS=
LDFLAGS= -lz -lm

Expand Down
13 changes: 6 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,8 @@

A C++ tool for Gene-based Analysis with oMniBus, Integrative Tests

- Implements SKAT, burden, and ACAT gene-based test methods using variant- or region-based functional annotations
- Calculates annotation-stratified gene-based tests (e.g., TWAS/PrediXcan tests using eSNPs, gene-based tests using only coding variants, and gene-based tests using enhancer-to-target-gene maps)
- Calculates omnibus gene-based tests by aggregating across annotation classes
- Implements several gene-based test forms (quadratic: weighted sum of Zsq, linear: weighted sum of Z, and maximum Zsq) to aggregate GWAS single-variant summary statistics cross-referenced with variant- or region-based functional annotations
- Calculates annotation-stratified gene-based tests (e.g., TWAS/PrediXcan tests using eSNPs, gene-based tests using only coding variants, and gene-based tests using enhancer-to-target-gene maps), and omnibus tests by combining p-values for each gene
- Inputs: GWAS association summary statistics file (chromosome, position, ref/alt allele, and z-score or beta-hat + se), annotation files, and LD reference panel


Expand Down Expand Up @@ -38,7 +37,7 @@ chr1 769200 769400 Enhancer chr1:769200:769400 C1orf170:3.36|PERM1:3.36

- Association tests for individual regulatory elements is reported in `*.stratified_out.txt` files, and gene-based p-values (aggregating across regulatory elements for each gene) in `*.summary_out.txt` files.

- **Aggregation Methods for Regulatory Elements.** By default, GAMBIT aggregates test statistics across variants in regulatory elements using a weighted sum of single-variant chi-squared statistics (SKAT gene-based test). To instead use weighted ACAT to combine single-variant p-values, specify `--acat`.
- **Aggregation Methods for Regulatory Elements.** By default, GAMBIT aggregates test statistics across variants in regulatory elements using a weighted sum of single-variant chi-squared statistics (SKAT gene-based test). To instead use weighted ACAT or HMP to combine single-variant p-values, specify `--no-skat` and a p-value combination method via `--pcomb`.

#### Gene-Based Analysis with Coding and Other Annotated Variants

Expand All @@ -57,7 +56,7 @@ UTR UTR5 Utr5

- **Gene-Based Test Output.** Test statistics stratified by gene and annotation subclass are provided in `*.stratified_out.txt` files, and gene-based p-values (aggregating across annotation classes for each gene) in `*.summary_out.txt` files.

- **Variant Aggregation Methods.** By default, GAMBIT aggregates test statistics across variants using a weighted sum of single-variant chi-squared statistics (SKAT gene-based test). To instead use weighted ACAT to combine single-variant p-values, specify `--acat`.
- **Variant Aggregation Methods.** By default, GAMBIT aggregates test statistics across variants using a weighted sum of single-variant chi-squared statistics (SKAT gene-based test). To instead use weighted ACAT or HMP to combine single-variant p-values, specify `--no-skat` and a p-value combination method via `--pcomb`.

#### TWAS Analysis
- To compute TWAS/PrediXcan gene-based tests using GAMBIT, specify an eWeight file via `--eweights my_eWeights.txt.gz`, formatted
Expand All @@ -73,11 +72,11 @@ UTR UTR5 Utr5

- The `BETAS` field format is `eGene_A=Weight_A1@Tissue_A1;Weight_A2@Tissue_A2|eGene_B=Weight_B1@Tissue_B1`, and labels for tissue IDs can be specified in the header.
- **Subsetting tissues.** To restrict analysis to a subset of tissues/cell-types, specify a comma-separated list of tissues following the `--tissues` flag. By default, GAMBIT includes all tissues/cell-types present in the eWeight file.
- **Tissue Aggregation for Omnibus tests.** GAMBIT reports both single-tissue TWAS/PrediXcan analysis results, and omnibus tests results aggregating across all specified tissues/cell-types for each eGene. Omnibus p-values for multi-tissue TWAS/PrediXcan analysis can be calculated in GAMBIT using either 1) the maximum single-tissue test statistic based on the joint distribution of single-tissue statistics, 2) the sum of squared single-tissue z-scores (analogous to SKAT), or 3) ACAT [default]. Omnibus test method for multi-tissue analysis can be specified via `--tissue-aggreg` (`ACAT`, `MinP`, `SKAT`, or `ALL`).
- **Tissue Aggregation for Omnibus tests.** GAMBIT reports both single-tissue TWAS/PrediXcan analysis results, and omnibus tests results aggregating across all specified tissues/cell-types for each eGene. Omnibus p-values for multi-tissue TWAS/PrediXcan analysis can be calculated in GAMBIT using either 1) the maximum single-tissue test statistic based on the joint distribution of single-tissue statistics, 2) the sum of squared single-tissue z-scores (analogous to SKAT), or 3) PCOMB for ACAT or HMP [default]. Omnibus test method for multi-tissue analysis can be specified via `--tissue-aggreg` (`PCOMB`, `MinP`, `SKAT`, or `ALL`). P-value combination method can be specified via `--pcomb` (`ACAT` or `HMP`).
- **Single-tissue and omnibus test output.** Gene-based tests and p-values for each eGene-tissue pair are reported in `*.stratified_out.txt` files, and omnibus p-values (aggregating across all tissues for each eGene) in `*.summary_out.txt` files.

#### dTSS-Weighted Gene-Based Tests
- To incorporate un-annotated regulatory variants in gene-based analysis, GAMBIT implements a dTSS (distance to Transcription Start Site) weighted gene-based test, which aggregates all single-variant p-values within a specified window from each gene's TSS using ACAT and assigns higher weight to variants nearer the TSS using an exponential decay function.
- To incorporate un-annotated regulatory variants in gene-based analysis, GAMBIT implements a dTSS (distance to Transcription Start Site) weighted gene-based test, which aggregates all single-variant p-values within a specified window from each gene's TSS using weighted ACAT or HMP and assigns higher weight to variants nearer the TSS using an exponential decay function.
- To compute dTSS-weighted gene-based tests, specify a TSS bed file via `--tss-bed my_tss_bed.bed.gz`, fomatted

```
Expand Down
Binary file modified bin/GAMBIT
Binary file not shown.
22 changes: 15 additions & 7 deletions src/Main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,10 @@ void print_usage() {
cerr << " --ldref-only : only retain variants with complete LD information\n";
cerr << " --tissues STR : only use eSNPs from tissues listed in file\n";
cerr << " test statistics\n";
cerr << " --pcomb : p-value combination method (\"ACAT\" or \"HMP\") \n";
cerr << " --tss-alpha : alpha values (comma-separated) for dTSS weights \n";
cerr << " --acat : use ACAT rather than SKAT for regulatory elements & exons\n";
cerr << " --tissue-aggreg [ACAT] : method to aggregate eSNP tests across tissues (\"ACAT\", \"MinP\", \"SKAT\", or \"All\") \n";
cerr << " --no-skat : use ACAT or HMP rather than SKAT for regulatory elements & exons\n";
cerr << " --tissue-aggreg: method to aggregate eSNP tests across tissues (\"PCOMB\" for ACAT/HMP (default), \"MaxZsq\", \"SKAT\", or \"All\") \n";
cerr << " --tissues STR : only use eSNPs from listed tissues (comma-separated list or file)\n";
cerr << " output\n";
cerr << " --region STR : restrict analysis to specified region \n";
Expand Down Expand Up @@ -74,7 +75,9 @@ int main (int argc, char *argv[]) {

char default_test = 'Q'; // default is 'Q' (SKAT)

int cauchy_no_skat = 0;
int pcomb_no_skat = 0;

string PCOMB_METHOD = "HMP";

string TSS_VERBOSITY_PVAL_STR = "";
double TSS_VERBOSITY_PVAL = 1.00;
Expand All @@ -93,6 +96,7 @@ int main (int argc, char *argv[]) {
{"no-memo", no_argument, &no_memo_LD, 1},
{"preload", no_argument, &preload_LD, 1},
{"ldref", required_argument, 0, 'l'},
{"pcomb", required_argument, 0, 'z'},
{"gwas", required_argument, 0, 'g'},
{"anno-defs", required_argument, 0, 'a'},
{"defs", required_argument, 0, 'a'},
Expand All @@ -107,7 +111,7 @@ int main (int argc, char *argv[]) {
{"merge-tissues", no_argument, &tmerge, 1},
{"tissue-aggreg", required_argument, 0, 'm'},
{"debug", no_argument, &debug_mode, 1},
{"acat", no_argument, &cauchy_no_skat, 1},
{"no-skat", no_argument, &pcomb_no_skat, 1},
{"stdout", no_argument, &print_screen, 1},
{"bayes", no_argument, &bayes, 1},
{"region", required_argument, 0, 'r' },
Expand All @@ -118,7 +122,7 @@ int main (int argc, char *argv[]) {
{0, 0, 0, 0}
};
int long_index =0;
while ((opt = getopt_long(argc,argv,"l:g:a:f:s:x:y:e:j:t:b:m:r:p:v:",long_options,&long_index)) != -1) {
while ((opt = getopt_long(argc,argv,"l:g:a:f:s:x:y:z:e:j:t:b:m:r:p:v:",long_options,&long_index)) != -1) {
switch (opt) {
case 'f' : afile = optarg;
break;
Expand All @@ -142,6 +146,8 @@ int main (int argc, char *argv[]) {
break;
case 'y' : TSS_WINDOW_STR = optarg;
break;
case 'z' : PCOMB_METHOD = optarg;
break;
case 'e' : TSS_VERBOSITY_PVAL_STR = optarg;
break;
case 'b' : bfile = optarg;
Expand Down Expand Up @@ -226,6 +232,8 @@ int main (int argc, char *argv[]) {
setPreload(false);
}

setCombPvalMethod(PCOMB_METHOD);

setMultiForm(multi_test_type);

if( JUMP_DIST_STR != "" ){
Expand All @@ -244,8 +252,8 @@ int main (int argc, char *argv[]) {

initCHR();

if( cauchy_no_skat ){
default_test = 'C'; // rather than SKAT, using Cauchy/ACAT test (no LD)
if( pcomb_no_skat ){
default_test = 'C'; // rather than SKAT, using ACAT/HMP test (no LD)
}

if( TSS_VERBOSITY_PVAL_STR != "" ){
Expand Down
32 changes: 32 additions & 0 deletions src/distributionFunctions.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
#include "distributionFunctions.hpp"

using namespace std;

double pchisq( double stat, double df) {
double cdf, ccdf;
cumchi ( &stat, &df, &cdf, &ccdf );
return ccdf;
}

double pchisq( double stat, double df, double ncp) {
double cdf, ccdf;
cumchn( &stat, &df, &ncp, &cdf, &ccdf);
return ccdf;
}

double qchisq( double cdf, double df) {
double stat, ccdf, bd;
int which = 2;
ccdf = 1 - cdf;
int status;
cdfchi ( &which, &cdf, &ccdf, &stat, &df, &status, &bd );
return stat;
}

double pcauchy(double x){
return 0.5 + atan( x )/M_PI;
}

double qcauchy(double q){
return tan( M_PI*(q - 0.5) );
}
31 changes: 31 additions & 0 deletions src/distributionFunctions.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
#ifndef DISTRIBUTIONFUNCTIONS_HPP
#define DISTRIBUTIONFUNCTIONS_HPP

#include "cdflib/cdflib.hpp"
#include "eigenmvn/eigenmvn.hpp"
#include "ROOT_Math/Landau.hpp"

#include <iostream>
#include <sstream>
#include <fstream>
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <string>
#include <algorithm>
#include <vector>
#include <unordered_map>
#include <unordered_set>
#include <set>
#include <map>

using namespace std;

double pcauchy(double);
double qcauchy(double);

double pchisq(double, double);
double pchisq(double, double, double);
double qchisq(double, double);

#endif
Loading

0 comments on commit 1644b80

Please sign in to comment.