forked from NBISweden/workshop_omicsint_ISMBECCB
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathlab_meta-analayses-v2.Rmd
822 lines (613 loc) · 30.5 KB
/
lab_meta-analayses-v2.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
---
title: "Meta analyses of omics data"
subtitle: "Workshop on Omics Integration"
author: "`r paste0('<b>Ashfaq Ali</b> • ',format(Sys.time(), '%d-%b-%Y'))`"
output:
bookdown::html_document2:
toc: true
toc_float: true
toc_depth: 4
number_sections: true
theme: flatly
highlight: tango
df_print: default
code_folding: "none"
self_contained: false
keep_md: false
encoding: 'UTF-8'
css: "assets/lab.css"
---
```{r,child="assets/header-lab.Rmd"}
```
```{r, eval=FALSE, echo=FALSE}
#save(QCresult, QCresult2, file = "results/QC.rda")
load(file = "./results/QC.rda")
```
## Installation and setup
Before we begin reading this tutorial and running the code, let us set
up our R environment for this session. We will be using "renv" package
in R to install all the necessary packages needed for the sessions.
> *Note: Due to the use of the development versions of R packages that
> are not available through conda repositories we are not able to
> generate a conda environment for this session and an alternative
> solution needs to be used.*
```{r}
setwd("~/Documents/GitHub/workshop_omics_integration/session_meta/")
getwd()
```
```{r, eval=FALSE}
## make sure that the renv.lock file for the session exists and do following.
install.packages("renv")
renv::restore(lockfile = "./Meta_tutorial/tuturial_meta.lock", prompt = TRUE)
```
> If XML lib fails in ubuntu and you get the error "Error installing
> package 'XML':" go to terminal and install the libraries. "sudo
> apt-get update" followed by "sudo apt-get install libxml2-dev" Now,
> try the renv::init() again.
>> If you are trying to install within you conda environmen, you may need to install some dependencies
>> `install.packages(c("curl","remotes"))` and `remotes::install_github("joshuaulrich/TTR")`
>> In bash of your active conda environment run `conda install -c conda-forge r-gert`
> There maybe issues with installtion even after these depending on you architecture,
> in that case you can locate the results for each of the sections in "results.tar.gz"
> file which you can expand by `tar -xvzf results.tar.gz` and explore the results.
# Introduction
Meta analyses are commonly used in clinical studies to assess the effect
of a treatment or a genetic locus on a phenotype. With the advent of
GWAS (Genome wide association studies), various studies report summary
statistics on associated effects of genetic loci on phenotype. Meta
analyses can be useful in determining whether the effect size is
consistent across the body of data
The goal of a synthesis is to understand the results of any study in the
context of all the other studies. First, we need to know whether or not
the effect size is consistent across the body of data. If it is
consistent, then we want to estimate the effect size as accurately as
possible and to report that it is robust across the kinds of studies
included in the synthesis. On the other hand, if it varies substantially
from study to study, we want to quantify the extent of the variance and
consider the implications. Meta-analysis is able to address these issues
whereas the narrative review is not. We start with an example to show
how meta-analysis and narrative review would approach the same question,
and then use this example to highlight the key differences between the
two.
# Meta analyses Methods
1. **p-value**
<!-- -->
a. [**Fisher**](https://link.springer.com/chapter/10.1007%2F978-1-4612-4380-9_6):
Sum of minus log-transformed *P*-values where larger Fisher score
reflects stronger aggregated differential expression evidence.
b. [**Stouffer**](https://cutt.ly/rc53t31): Sum of inverse normal
transformed *P*-values where larger Stouffer score to reflect
stronger aggregated statistical evidence.
c. [**adaptively weighted
Fisher(AW)**](https://doi.org/10.1214/10-AOAS393)**, [original
publication](https://ui.adsabs.harvard.edu/abs/2011arXiv1108.3180L/abstract)**
: assigns different weights to each individual study and it
searches through all possible weights to find the best adaptive
weight with the smallest derived *p*-value. One significant
advantage of this method is its ability to indicate which studies
contribute to the evidence aggregation and elucidates heterogeneity
in the meta-analysis.
d. [**minimum p-value
(minP)**](https://psycnet.apa.org/record/1951-06623-001)**:** The
minP method takes the minimum *p*-value among the *K* studies as the
test statistic
e. [**maximum p-value
(maxP)**](https://psycnet.apa.org/record/1951-06623-001)**:** The
maxP method takes maximum *p*-value as the test statistic
f. [**rth ordered p-value
(rOP)**](https://pubmed.ncbi.nlm.nih.gov/25383132/)**:** The rOP
method takes the *r-*th order statistic among sorted *p*-values
of *K*combined studies
> Note: The assumption of uniformly distributed *P*-values under the
> null hypothesis or can be done non-parametrically by permutation-based
> analysis
<!-- -->
2. **Effect Size based**
<!-- -->
a. [**fixed effects model
(FEM)**](https://pubmed.ncbi.nlm.nih.gov/12855442/):FEM combines the
effect size across *K* studies by assuming a simple linear model
with an underlying true effect size plus a random error in each
study
b. [**random effects model
(REM)**](https://pubmed.ncbi.nlm.nih.gov/12855442/): REM extends FEM
by allowing random effects for the inter-study heterogeneity in the
model.
<!-- -->
3. **Rank based**
<!-- -->
a. [**rank product
(rankProd)**](https://pubmed.ncbi.nlm.nih.gov/16982708/)RankProd and
RankSum are based on the common biological belief that if a gene is
repeatedly at the top of the lists ordered by up- or down-regulation
fold change in replicate experiments, the gene is more likely a DE
gene.
b. [**naive sum of ranks and naive product of
ranks**](https://pubmed.ncbi.nlm.nih.gov/16982708/)**:** These two
methods apply a naïve product or sum of the DE evidence ranks across
studies.
## Statistical considerations
In addition to statistical methods mentioned above, a number of factors
need to be considered for the choice of method. Different test
statistics maybe used depending on the type of outcome variable (e.g.
t-statistic or moderated t-statistic for binary outcome, F-statistic for
multi-class outcome, regression or correlation coefficient for
continuous outcome and log-rank statistic for survival outcome).
Here we will be using
[MetaDE](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3463115/) package
that has implemented above mentioned methods.
A summary of the methods and their implementations for respective
outcome variables are listed in the table below as presented in the
original publication.

For a detailed review of the applied methods for meta analyses, related
benchmarks and
## Required packages
```{r packages, warning=FALSE, message=FALSE, eval=TRUE}
library(magrittr)
library(plyr)
library(preproc)
library(MetaQC)
library(MetaDE)
library(MetaPath)
library(MetaDCN)
```
# Summary of the session
I this session we will be working with the "prostate8.rda" data set that
is located in the data directory of the "session_meta" directory of the
github repository for the course.
Some of the R packages packages we are using here are not available
through conda repositories and therefore need to be install from the
github repositories for the respective packages
The session comprises of following main steps
1. Explore the data and the packages
2. Filter the data based on QC
3. Perform differential expression analyses and meta analyses
4. Perfrom pathway analyses for meta analyses results
5. Perfrom pathway meta analyses
6. Network meta analyses (optional)
7. Exercise: compare different meta analyses methods including
AW-Fisher, REM, FEM \#\# Perform QC
# Data for the session
We will using prostate dataset
Load the protaste data set
```{r data, warning=FALSE}
load(file = "./data/prostate8.rda")
```
The prostate data is comprised of 8 microarray studies from different
microarray platforms. Here it is provided in a list format where data
matrices and corresponding labels are provided.
Let us take a quick look at the data.
```{r data-names, warning=FALSE, message=FALSE}
names(prostate8)
```
We can have a look at the structure of each data sets and evaluate what
sort of filtering steps we need to take.
```{r data-dims, warning=FALSE, message=FALSE}
library(magrittr)
lapply(prostate8$data, dim) %>% as.data.frame(row.names = c("genes", "samples"))
```
As we can see, the studies have different number of genes measured and
contain different number of samples. In a meta analyses, studies can
have different samples sizes but genes should match between different
studies. To be able to merge data sets, all data matrices should have
the same annotations
Let us look at the row names which are already set to gene names.
```{r data-view, warning=FALSE, message=FALSE}
lapply(prostate8$data, row.names) %>% lapply(.,head, 5) %>% as.data.frame()
```
It is clear that the gene names are set to gene symbols for all data
sets and we can merge theses data set.
The second element of the lists corresponds to data labels where each
value corresponds to disease status of the individual where the sample
came from.
```{r data-label, warning=FALSE}
prostate8$dataLabel %>% lapply(. ,table) %>% as.data.frame() %>% t
```
Data labels indicate "0" as control and "1" as disease groups. We can
set these values to character format for convenience.
If everything looks good, we are good to go to the next step.
> Take a moment to explore the data in your own ways and fimilarise
> yourself wit it.
------------------------------------------------------------------------
# Data preprocessing
## Gene matching
Usually different microarray platforms use their own probe IDs or
experiments from different omics platforms can have annotations for
transcripts of proteins. To perform meta-analysis, one needs to match
probe/transcript IDs from different platforms to the unique official
gene ID, such as ENTREZ ID or gene symbol.
Options for situations for microarray data include
- take the average value of expression values across multiple probe
IDs to represent the corresponded gene symbol
- select the probe ID with the largest interquartile range (IQR) of
expression
- Or some version of summary at the gene level depending on the omics
platform.
We do not cover the pre-processing steps for different omics
technologies at the moment but the workshop participants are encouraged
to apply the domain specific knowledge when setting up a meta analyses
study.
```{r data-overlap, warning=FALSE, message=TRUE}
MetaQC::metaOverlap(prostate8$data) %>%
lapply(dim) %>% as.data.frame(row.names = c("genes", "samples"))
```
As you can see here, only 4241 genes are present in all studies and if
we are to filter out this way, we lose a lot of data.
> Biologically, it is likely that most genes are either un-expressed or
> un-informative. In gene expression analysis to find DE genes, these
> genes contribute to the false discoveries, so it is desirable to
> filter out these genes prior to analysis. After genes are matched
> across studies, the unique gene symbols are available across all
> studies. Two sequential steps of gene filtering can be performed. In
> the first step, we filter out genes with very low gene expression that
> are identified with small average expression values across majority of
> studies. And then we can remove genes that are not variable in your data sets
> using variance estimates as they are not useful in comparisons.
You can take a look at [preproc]() package for some of the functions
available for filtering the data and the intuition behind the methods.
Here `Annotate(), Impute(), Filter() and Merge()` maybe useful for
pre-processing steps of the data analyses.
```{r preprocess, warning=FALSE, message=FALSE}
data2 <- prostate8$data
data2 <- preproc::Merge(data2)
data.type = rep("microarray", length(data2)) # a character vector for each study type
data2_filt <- preproc::Filter(data2, del.perc = c(0.1, 0.1),data.type = data.type ) # Here we specify the percentage of genes to be filtere, and specify the type of data for each study
summary_preproc <- rbind(as.data.frame(lapply(prostate8$data, dim))[1,],
as.data.frame(lapply(data2, dim))[1,],
as.data.frame(lapply(data2_filt, dim))
)
rownames(summary_preproc) <- c("orginal", "merged", "mergeFiltered", "samples")
summary_preproc
```
Note that the multiple gene expression data sets may not be very well
aligned by genes, and the number of genes in each study maybe different.
When we combine a large number of studies, the number of common genes
may be very small, so we need to allow for genes appearing in most
studies and missing in few studies etc.
## Load pathway database
Let us perform quality control of the data for this meta analyses. We
are using the
["MetaQC"](https://academic.oup.com/nar/article/40/2/e15/2408973) that
identifies ways to objectively perform quality control for the
microarray studies.
```{r qc-1a, eval=FALSE, message=FALSE, warning=FALSE}
load(file = "./data/pathways.rda")
DList=prostate8$data
colLabel=prostate8$dataLabel
#GList=pathway[[1]]
#GList=pathwayDatabase
filterGenes=TRUE
cutRatioByMean=0.3 #
cutRatioByVar=0.3
#tic() not run
QCresult=MetaQC(DList, colLabel, GList=c(Hallmark.genesets, KEGG.genesets, Immunologic.genesets),filterGenes,cutRatioByMean,cutRatioByVar, ) ## This will take some time depending on the number of studies and the type of data
#toc()
```
> If you are unable to run the above steps, please load the "QC.rda" from
> the results directory of the session.
```{r, echo=FALSE}
load("results/QC_step1.rda")
```
```{r qc-1b}
QCresult$scoreTable
```
`r colnames(QCresult$scoreTable)`
> Internal quality control index:small IQC indicated that the study had
> heterogeneous coexpression structure with other studies and was
> considered a candidate problematic study that should be excluded from
> meta-analysis
>
> the external quality control (EQC): small *EQC* indicated that the
> study had low association with pathway in terms of gene pairwise
> correlation structure and maybe considered a candidate problematic
> study.
>
> accuracy quality control (*AQC*) and a consistency quality control
> (*CQC*).
>
> Large *AQCg* measure for a given study indicate that DE genes produced
> by study were reproducible compared to DE genes detected by
> meta-analysis excluding study
>
> Having a large CQCg measure for a given study indicated that DE
> evidence produced by study was consistent with DE evidence generated
> by meta-analysis excluding study. Please read the original [MetQC
> publication](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3898528/#B14)
> for a better understanding of the measures.
We can now look at the two dimensional biplot based on PCA analyses and
get an idea of any studies that may not be of great quality.
```{r qc-plot}
MetaQC::plotMetaQC(QCresult$scoreTable)
```
Although the `plotMetaQC()` function is a useful tool for visualization,
it does not necessarily allow automatic inclusion/exclusion criteria.
Note that our visualization and summarization tools were not meant for
an automated recommendation for inclusion/exclusion decision. In the
examples we explored, there were roughly three categories in the QC
results: definite exclusion cases with poor quality, definite inclusion
cases with good quality and borderline cases.
### Update based on the QC
Here Nanni and Tomlins are the two studies that maybe of low quality and
should perhaps be excluded from the analyses.
```{r remove-studies, message=FALSE, warning=FALSE}
filterGenes=TRUE
cutRatioByMean=0.3 #
cutRatioByVar=0.3
to_remove <- c("Nanni", "Tomlins")
prostate6 <- list(data = within(prostate8$data, rm(Nanni, Tomlins)), dataLabel=within(prostate8$dataLabel, rm(Nanni, Tomlins)))
prostate_fil <- list(data = MetaQC::metaOverlap(prostate6$data), dataLabel=prostate6$data)
prostate6$data <-MetaQC::metaFilterData(prostate_fil$data,cutRatioByVar = cutRatioByVar, cutRatioByMean = cutRatioByMean)
lapply(prostate6, names)
as.data.frame(lapply(prostate6$data, dim))
```
```{r qc-2, eval=FALSE, warning=FALSE, message=FALSE}
DList=prostate6$data
colLabel=prostate6$dataLabel
#GList=pathway[[1]]
#GList=pathwayDatabase
filterGenes=TRUE
cutRatioByMean=0.3 #
cutRatioByVar=0.3
QCresult2=MetaQC(DList, colLabel, GList=c(Hallmark.genesets, KEGG.genesets, Immunologic.genesets), filterGenes, cutRatioByMean, cutRatioByVar)
```
```{r, include=FALSE, eval=FALSE}
save(QCresult, file = "results/QC_step1.rda")
save(QCresult2, file = "results/QC_step2.rda")
```
> AGAIN, if you are unable to run the code, please load
```{r}
load("results/QC_step2.rda")
```
```{r qc-2b}
QCresult2$scoreTable
```
```{r qc-plot-2}
plotMetaQC(scoreTable = QCresult2$scoreTable)
```
# Meta Differential Expression analyses
We have listed the methods for differential expression meta analyses in
the introduction section that are implemented in the MetaDE package.
Here we will try the best performing methods including AW Fischer, REM
and rOP. A detailed comparison of relative performances of each of the
statistical methods is descibed [Lun-Ching
etal](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3898528/#B14).
```{r de-meta-analyses, eval=TRUE, message=FALSE}
data <- prostate6$data # Extract the expression matrices
clin.data <- prostate6$dataLabel # extract lables for each sample
K <- length(data)
clin.data <- lapply(clin.data, function(x) {data.frame(x)} )
for (k in 1:length(clin.data)){
colnames(clin.data[[k]]) <- "label"
clin.data[[k]] <- (ifelse(clin.data[[k]]==0, "Control", "Cancer"))
}
#clin.data <- lapply(clin.data, function(x) {data.frame(x)} )
select.group <- c("Control", "Cancer")
ref.level <- "Control"
data.type <- "continuous"
ind.method <- rep('limma',length(data))
resp.type <- "twoclass"
paired <- rep(FALSE,length(data))
meta.method <- "Fisher"
meta.res <- MetaDE(data=data,clin.data = clin.data,
data.type=data.type,resp.type = resp.type,
response='label',
ind.method=ind.method, meta.method=meta.method,
select.group = select.group, ref.level=ref.level,
paired=paired,tail='abs',parametric=TRUE)
meta.res.summary <- MetaDE::summary.meta(meta.res, resp.type = resp.type, meta.method = meta.method)
head(meta.res.summary)
# save(meta.res, meta.res.summary, file = "./results/resMetaDE.rda")
```
> If you were unable to run the MetaDE analyses, you can load the results of the above
> commands to review the output by usingh `load("./results/resMetaDE.rda")`
## Plot meta analyses results in a heatmap.
```{r plot-de-heatmap, eval=FALSE}
fdr.cut <- 1e-7
pdf('./results/heatmap.DE.pdf')
heatmap.sig.genes(meta.res, meta.method=meta.method,
fdr.cut=fdr.cut,color="GR")
dev.off()
```
{width=70%, height=150%}
## Pathway analyses based on meta analyses
At this stage, one can do pathway analyses directly on the genes that
appear to be differentially regulated based on the meta analyses
performed above.
```{r pathways-de, eval=FALSE}
meta.p <- meta.res$meta.analysis$pval
ks.result <- PathAnalysis(meta.p = meta.p, enrichment = "KS")
fisher.result <- PathAnalysis(meta.p = meta.p, enrichment = "Fisher's exact")
## One can customize pathway datbase to update the results like following
load(file = "./data/pathwayDatabase.rda")
## Let us look into the database
pathway_names <- pathwayDatabase %>% names() %>% stringr::str_split(pattern = "_", simplify = TRUE) %>% {.[,1]} %>% unique()
print(pathway_names)
path.res <- MetaDE::PathAnalysis(meta.p ,pathway = pathwayDatabase, p.cut = 0.05,
enrichment = "Fisher's exact", DEgene.number = 400, size.min = 10, size.max = 500)
```
```{r, include=FALSE}
#save(fisher.result, path.res, file = "./results/MetaDEPathways.rda")
```
> The pathway analyses performed here is based on the p-values obtained
> after the joint meta analyses of all the studies in your data. Some
> time individual study level pathway analyses is reported and we are
> interested in combining pathway level summary statistics. We will
> explore that in our next session where we will perform the analyses
> using **"MetaPath"** package.
## Exercise
- Perform DE meta analyses using any two methods among "AW-Fisher",
"FEM, "REM and "minMCC" and compare the number of differentially
expressed genes detected at FDR \<0.05. Take a look at the
`r ?MetaDE::MetaDE` function to change the parameters.
- Perform pathway analyses based one of the methods you tried and see
whether any differences at pathway level were detected.
```{r change-de-method, exercise=TRUE}
meta.method <- "AW"
meta.res <- MetaDE(data=data,clin.data = clin.data,
data.type=data.type,resp.type = resp.type,
response='label',covariate = NULL,
ind.method=ind.method, meta.method=meta.method,
select.group = select.group, ref.level=ref.level,
paired=paired, rth=NULL,
REM.type=NULL,tail='abs',parametric=TRUE)
```
```{r change-de-method-hint-1}
meta.method <- "FEM"
meta.res <- MetaDE(data=data,clin.data = clin.data,
data.type=data.type,resp.type = resp.type,
response='label',
ind.method=ind.method, meta.method=meta.method,
select.group = select.group, ref.level=ref.level,
paired=paired, tail='abs')
```
```{r change-de-method-hint-2}
meta.method <- "REM"
REM.type <- "HO"
meta.res <- MetaDE(data=data,clin.data = clin.data,
data.type=data.type,resp.type = resp.type,
response='label',
ind.method=ind.method, meta.method=meta.method,
select.group = select.group, ref.level=ref.level,
paired=paired,
REM.type=REM.type,tail='abs')
```
```{r change-de-method-hint-3}
meta.method <- 'minMCC'
meta.res <- MetaDE(data=data,clin.data = clin.data,
data.type=data.type,resp.type = resp.type,
response='label',
ind.method=ind.method, meta.method=meta.method,
select.group = select.group, ref.level=ref.level,
paired=paired,tail='abs',parametric=FALSE,nperm=100)
```
# Pathway analyses
## Introduction
Pathway analysis (a.k.a. gene set analysis) is a statistical tool to
infer correlation of differential expression evidence in the data with
pathway knowledge from established databases. The idea behind pathway
analysis is to determine if there is enrichment in the detected DE genes
based on an *a priori* defined biological category. Such a category
might come from one or multiple databases such as Gene Ontology
(GO; [www.geneontology.org](http://www.geneontology.org/)), the Kyoto
Encyclopedia of Genes and Genomes (KEGG; <http://www.genome.jp/kegg/>),
Biocarta Pathways (<http://www.biocarta.com/>) and the comprehensive
Molecular Signatures Database
(MSigDB; <http://www.broadinstitute.org/gsea/msigdb/>). For the majority
of recent microarray meta-analysis applications, pathway analysis has
been a standard follow-up to identify pathways associated with detected
DE genes [e.g.](https://pubmed.ncbi.nlm.nih.gov/17974971/)
and many others]. The result provides more insightful biological
interpretation and it has been reported that pathway analysis results
are usually more consistent and reproducible across studies than DE gene
detection. [Shen and Tseng](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2865865/)
developed a systematic framework of Meta-Analysis for Pathway Enrichment
(MAPE) by combining information at gene level, at pathway level and a
hybrid of the two. We will use this package for systematic analyses of microarray data from different studies.
## Step1: MAPE_P analysis
This is the first major function in the MetaPath2.0 package which combines the Meta-analysis for Pathway Enrichment (MAPE) methods introduced by Shen and Tseng (2010) and the Comparative Pathway Integrator (CPI) method introduced by [Fang and Tseng (2016)](https://www.biorxiv.org/content/10.1101/444604v1.full.pdf). The default function is CPI which performs MAPE_P (integrating multiple studies at pathway level) with Adaptively Weighted Fisher's method as Meta-analysis statistics.

```{r pathway-meta-step-1, eval=FALSE}
#load("./data/pathways.rda")
CPI_result = MAPE2.0(arraydata = data , clinical.data = clin.data,
label = "label",pmtx = NULL,pathway = c(Biocarta.genesets, GOBP.genesets,
GOCC.genesets, GOMF.genesets, KEGG.genesets, Reactome.genesets), data.type ="discrete",
resp.type = "twoclass",method = "CPI", ind.method = rep("limma",length(data)),
paired =rep(FALSE,length(data)),select.group=select.group, ref.level=ref.level ,
tail="abs", enrichment = "Fisher's exact", DEgene.number = 400,stat = "AW Fisher")
save(CPI_result, file = "./results/MetaPEResults/MAPE_p_pathways.rds")
```
## Step 2

```{r pathway-meta-step-2, eval=FALSE}
set.seed(15213)
CPI.kappa_result = MAPE.Kappa(summary = CPI_result$summary,pathway = CPI_result$pathway,
max_k = 10, q_cutoff = 0.05,software = CPI_result$method, output_dir = "./results/MetaPEResults/")
```

## Step 3

```{r pathway-meta-step-3, eval=FALSE}
data(hashtb)
#data("./data/hashtb.rda")
#xx <- data(hashtb)
set.seed(1)
MAPE.Clustering.results <- MAPE.Clustering(summary=CPI_result$summary,Num_Clusters = 6,
kappa.result = CPI.kappa_result$kappa,sil_cut=0.01,
Num_of_gene_lists=CPI_result$Num_of_gene_lists,genelist =CPI_result$genelist,
pathway=CPI_result$pathway, enrichment=CPI_result$enrichment,
method=CPI.kappa_result$method,software=CPI_result$method, n.text.permute = 1000, output_dir = "./results/MetaPEResults/" )
```
You can take a look at the "Clustering_Summary.csv" to gen an idea of reproducibility of pathways and the amount of evidence provided by each study.
Of the heat
{width=70%, height=120%}
------------------------------------------------------------------------
# Network Meta analyses using MetaDCN
Co-expression analysis and network analysis of microarray data are used
to investigate potential transcriptional co-regulation and gene
interactions. Network analyses typically work with the gene--gene
co-expression matrix, which represents the correlation between each pair
of genes in the study. A crucial assumption is that the magnitude of the
co-expression between any pair of genes is associated with a greater
likelihood that the two genes interact. Thus, networks of interactions
between genes are inferred from the co-expression matrix.
Here we will use [MetaDCN](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6041767/) to extract network modules in disease vs control conditions.

## Correction and adjacency matrices
```{r network-meta-1, eval=FALSE}
#data("./data/pathwayDatabase.rda")
#data(example)
# Generate correction and adjacency matrices for data and permutation
# GeneNet returns a list of information which will be used for SearchBM and MetaDCN function, and several RData files stored in folder path.
# AdjacencyMatrices.RData is a list of adjacency matrices for case and control in each study in the order of case studies and control studies.
#
# CorrelationMatrices.RData is a list of correlation matrices for case and control in each study.
#
# AdjacencyMatricesPermutationP.RData is a list of correlation matrices for case and control in each study in permutation P.
GeneNetRes.2 <- GeneNet(data , clin.data, caseName="Cancer", controlName="Control", meanFilter=0.8, SDFilter=0.8, edgeCutoff=0.1, permutationTimes=4, CPUNumbers=4, pathwayDatabase=c(Biocarta.genesets,GOBP.genesets,GOCC.genesets,GOMF.genesets,
KEGG.genesets,Reactome.genesets), silent=FALSE, folder = "./results/MetaDCNResults/")
```
## Correlations/Co-expressions
```{r network-meta-2, eval=FALSE}
# This function will search for basic modules differentially co-expressed between case and control
# SearchBM will return a list and several Rdata, csv and png files saved in the folder path specified in GeneNet inputs.List of basic module information:
# w1
# w1 weight with the most basic modules detected
#
# BMInCase
# data matrix listing the information of basic modules higher correlated in case
#
# BMInControl
# data matrix listing the information of basic modules higher correlated in control
#
# permutation_energy_direction_p.Rdata is a list of energies of basic modules from permutation p.
#
# basic_modules_summary_direction_weight_w.csv is a summary of basic modules detected using weight w in forward/backward search.
#
# threshold_direction.csv is a table listing number of basic modules detected under different FDRs in forward/backward search.
#
# figure_basic_module_c_repeat_r_direction_weight_w.png is a plot of basic module from component c repeat r using weight w in forward/backward search.
SearchBMRes <- SearchBM(GeneNetRes.2, MCSteps=500, jaccardCutoff=0.7, repeatTimes=5, outputFigure=TRUE, silent=FALSE )
```

## Find and assemble basic modules
```{r network-meta-3, eval=FALSE}
#This function will assemble basic modules detected from SearchBM into supermodules.
# w1:w1 used
#
# BMInCaseSig: Summary of basic modules higher correlated in Case controling FDR
#
# BMInControlSig
# Summary of basic modules higher correlated in Control controling FDR
#
# Supermodule: Summary of supermodules
#A number chosen from (100, 200, ..., 700) to specify the weight1 used in objective function (optional). If not specified, w1 from SearchBM function will be used (recommended).
MetaDNCRes <- MetaDCN(GeneNetRes.2, SearchBMRes, FDRCutoff=0.05, w1=NULL, silent=FALSE)
```
You get the modules, their pathway annotations and cytoscape files in the results directory.
For the final assembled modules have a look at the "module_assembly_summary_weight_500.csv" and find out ho many "supe modules you are able to identity on q value cut off of 0.05"
# Session info
```{r,echo=FALSE}
sessionInfo()
```
__End of document__