|
| 1 | +--- |
| 2 | +title: "basics analysis using MicrobiotaProcess" |
| 3 | +author: | |
| 4 | + | Shuangbin Xu |
| 5 | + | School of Basic Medical Sciences, Southern Medical University |
| 6 | +date: "`r Sys.Date()`" |
| 7 | +bibliography: MicrobiotaProcess.bib |
| 8 | +biblio-style: apalike |
| 9 | +output: |
| 10 | + prettydoc::html_pretty: |
| 11 | + toc: true |
| 12 | + theme: cayman |
| 13 | + highlight: vignette |
| 14 | + pdf_document: |
| 15 | + toc: true |
| 16 | +vignette: > |
| 17 | + %\VignetteIndexEntry{ MicrobiotaProcess: basics analysis using MicrobiotaProcess.} |
| 18 | + %\VignetteEngine{knitr::rmarkdown} |
| 19 | + %\usepackage[utf8]{inputenc} |
| 20 | + %\VignetteEncoding{UTF-8} |
| 21 | +--- |
| 22 | + |
| 23 | +```{r, echo=FALSE, results="asis", message=FALSE, KnitrSetUp} |
| 24 | +knitr::opts_chunk$set(tidy=FALSE,warning=FALSE,message=FALSE) |
| 25 | +Biocpkg <- function (pkg){ |
| 26 | + sprintf("[%s](http://bioconductor.org/packages/%s)", pkg, pkg) |
| 27 | +} |
| 28 | +
|
| 29 | +CRANpkg <- function(pkg){ |
| 30 | + cran <- "https://CRAN.R-project.org/package" |
| 31 | + fmt <- "[%s](%s=%s)" |
| 32 | + sprintf(fmt, pkg, cran, pkg) |
| 33 | +} |
| 34 | +``` |
| 35 | + |
| 36 | +```{r, echo=FALSE, results="hide", message=FALSE, Loadpackages} |
| 37 | +library(ggplot2) |
| 38 | +library(DT) |
| 39 | +library(tidyverse) |
| 40 | +library(phyloseq) |
| 41 | +library(ggtree) |
| 42 | +library(treeio) |
| 43 | +library(tidytree) |
| 44 | +library(MicrobiotaProcess) |
| 45 | +``` |
| 46 | + |
| 47 | +# 1. Introduction |
| 48 | + |
| 49 | +`MicrobiotaProcess` is an R package for analysis, visualization and biomarker discovery of microbial datasets. It supports the import of microbiome census data, calculating alpha index and provides functions to visualize rarefaction curves. Moreover, it also supports visualizing the abundance of taxonomy of samples. And It also provides functions to perform the `PCA`, `PCoA` and hierarchical cluster analysis. In addition, `MicrobiotaProcess` also provides a method for the biomarker discovery of metagenome or other datasets. |
| 50 | + |
| 51 | +# 2. `MicrobiotaProcess` profiling |
| 52 | + |
| 53 | +## 2.1 import function |
| 54 | + |
| 55 | +`MicrobiotaProcess` has an feature to import phylogenetic sequencing data from `r Biocpkg("dada2")`[@callahan2016dada2] and [qiime2](https://qiime2.org/)[@bolyen2019qiime2] taxonomic clustering pipelines. The resulting object after import is `r Biocpkg("phyloseq")` object[@Paul2013phyloseq] |
| 56 | + |
| 57 | +```{r, error=TRUE, importFunction} |
| 58 | +# import data from dada2 pipeline. |
| 59 | +seqtabfile <- system.file("extdata", "seqtab.nochim.rds", package="MicrobiotaProcess") |
| 60 | +seqtab <- readRDS(seqtabfile) |
| 61 | +taxafile <- system.file("extdata", "taxa_tab.rds",package="MicrobiotaProcess") |
| 62 | +seqtab <- readRDS(seqtabfile) |
| 63 | +taxa <- readRDS(taxafile) |
| 64 | +# the seqtab and taxa are output of dada2 |
| 65 | +sampleda <- system.file("extdata", "mouse.time.dada2.txt", package="MicrobiotaProcess") |
| 66 | +ps_dada2 <- import_dada2(seqtab=seqtab, taxatab=taxa, sampleda=sampleda) |
| 67 | +ps_dada2 |
| 68 | +
|
| 69 | +# import data from qiime2 pipeline |
| 70 | +otuqzafile <- system.file("extdata", "table.qza", package="MicrobiotaProcess") |
| 71 | +taxaqzafile <- system.file("extdata", "taxa.qza", package="MicrobiotaProcess") |
| 72 | +mapfile <- system.file("extdata", "metadata_qza.txt", package="MicrobiotaProcess") |
| 73 | +ps_qiime2 <- import_qiime2(otuqza=otuqzafile, taxaqza=taxaqzafile, mapfilename=mapfile) |
| 74 | +ps_qiime2 |
| 75 | +``` |
| 76 | + |
| 77 | +## 2.2 Rarefaction visualization |
| 78 | + |
| 79 | +Rarefaction, based on sampling technique, was used to compensate for the effect of sample size on the number of units observed in a sample[@siegel2004rarefaction]. `MicrobiotaProcess` provided `ggrarecurve` to plot the curves, based on `rrarefy` of `r CRANpkg("vegan")`[@Jari2019vegan]. |
| 80 | + |
| 81 | +```{r, error=TRUE, fig.align="center", fig.height=3.2, fig.width=7.5, rarefaction} |
| 82 | +# for reproducibly random number |
| 83 | +set.seed(1024) |
| 84 | +p_rare <- ggrarecurve(obj=ps_dada2, |
| 85 | + indexNames=c("Observe","Chao1","ACE"), |
| 86 | + chunks=300) + |
| 87 | + theme(legend.spacing.y=unit(0.02,"cm"), |
| 88 | + legend.text=element_text(size=6)) |
| 89 | +p_rare |
| 90 | +
|
| 91 | +``` |
| 92 | + |
| 93 | +## 2.3 calculate alpha index and visualization |
| 94 | + |
| 95 | +`MicrobiotaProcess` provides `get_alphaindex` to calculate alpha index and the `ggbox` to visualize the result |
| 96 | + |
| 97 | +```{r, error=TRUE, fig.width=7.5, fig.height=3.2, fig.align="center", alphaindex} |
| 98 | +alphaobj <- get_alphaindex(ps_dada2) |
| 99 | +head(as.data.frame(alphaobj)) |
| 100 | +p_alpha <- ggbox(alphaobj, geom="violin", factorNames="time") + |
| 101 | + scale_fill_manual(values=c("#2874C5", "#EABF00"))+ |
| 102 | + theme(strip.background = element_rect(colour=NA, fill="grey")) |
| 103 | +p_alpha |
| 104 | +``` |
| 105 | + |
| 106 | +## 2.4 The visualization of taxonomy abundance |
| 107 | + |
| 108 | +`MicrobiotaProcess` presents the `ggbartax` for the visualization of composition of microbial communities. |
| 109 | + |
| 110 | +```{r, error=TRUE, fig.align="center", fig.height=4.5, fig.width=7, taxabar} |
| 111 | +# relative abundance |
| 112 | +otubar <- ggbartax(obj=ps_dada2) + |
| 113 | + xlab(NULL) + |
| 114 | + ylab("relative abundance (%)") |
| 115 | +otubar |
| 116 | +``` |
| 117 | + |
| 118 | +If you want to get the abundance of specific levels of class, You can use `get_taxadf` then use `ggbartax` to visualize. |
| 119 | + |
| 120 | +```{r, error=TRUE, fig.align="center", fig.height=4.5, fig.width=7, phylumAbundance} |
| 121 | +phytax <- get_taxadf(obj=ps_dada2, taxlevel=2) |
| 122 | +phybar <- ggbartax(obj=phytax) + |
| 123 | + xlab(NULL) + ylab("relative abundance (%)") |
| 124 | +phybar |
| 125 | +``` |
| 126 | + |
| 127 | + Moreover, the abundance (count) of taxonomy also can be visualized by setting count to TRUE, and the facet of plot can be showed by setting the facetNames. |
| 128 | + |
| 129 | +```{r, error=TRUE, fig.align="center", fig.height=4.5, fig.width=7, classAbundance} |
| 130 | +phybar2 <- ggbartax(obj=phytax, facetNames="time", count=TRUE) + xlab(NULL) + ylab("abundance") |
| 131 | +phybar2 |
| 132 | +classtax <- get_taxadf(obj=ps_dada2, taxlevel=3) |
| 133 | +classbar <- ggbartax(obj=classtax, facetNames="time", count=FALSE) + |
| 134 | + xlab(NULL) + ylab("relative abundance (%)") |
| 135 | +classbar |
| 136 | +``` |
| 137 | + |
| 138 | +## 2.5 PCA and PCoA analysis |
| 139 | + |
| 140 | +`PCA` (Principal component analysis) and `PCoA` (Principal Coordinate Analysis) are general statistical procedures to compare groups of samples. And `PCoA` can based on the phylogenetic or count-based distance metrics, such as `Bray-Curtis`, `Jaccard`, `Unweighted-UniFrac` and `weighted-UniFrac`. `MicrobiotaProcess` presents the `get_pca`, `get_pcoa` and `ggordpoint` for the analysis. |
| 141 | + |
| 142 | +```{r, error=TRUE, fig.align="center", fig.height=4.5, fig.width=6, ordanalysis} |
| 143 | +pcares <- get_pca(obj=ps_dada2, method="hellinger") |
| 144 | +# Visulizing the result |
| 145 | +pcaplot <- ggordpoint(obj=pcares, biplot=TRUE, speciesannot=TRUE, |
| 146 | + factorNames=c("time"), ellipse=TRUE) + |
| 147 | + scale_color_manual(values=c("#2874C5", "#EABF00")) |
| 148 | +pcaplot |
| 149 | +
|
| 150 | +pcoares <- get_pcoa(obj=ps_dada2, distmethod="euclidean", method="hellinger") |
| 151 | +# Visualizing the result |
| 152 | +pcoaplot <- ggordpoint(obj=pcoares, biplot=TRUE, speciesannot=TRUE, |
| 153 | + factorNames=c("time"), ellipse=TRUE) + |
| 154 | + scale_color_manual(values=c("#2874C5", "#EABF00")) |
| 155 | +pcoaplot |
| 156 | +``` |
| 157 | + |
| 158 | +## 2.6 Hierarchical cluster analysis |
| 159 | + |
| 160 | +`beta diversity` metrics can assess the differences between microbial communities. It can be visualized with `PCA` or `PCoA`, this can also be visualized with hierarchical clustering. `MicrobiotaProcess` also implements the analysis based on ggtree[@yu2017ggtree]. |
| 161 | + |
| 162 | +```{r, fig.align="center", fig.height=5, fig.width=6, error=TRUE, hclustAnalysis} |
| 163 | +hcsample <- get_clust(obj=ps_dada2, distmethod="euclidean", |
| 164 | + method="hellinger", hclustmethod="average") |
| 165 | +# rectangular layout |
| 166 | +clustplot1 <- ggclust(obj=hcsample, |
| 167 | + layout = "rectangular", |
| 168 | + pointsize=1, |
| 169 | + fontsize=0, |
| 170 | + factorNames=c("time")) + |
| 171 | + scale_color_manual(values=c("#2874C5", "#EABF00")) + |
| 172 | + theme_tree2(legend.position="right", |
| 173 | + plot.title = element_text(face="bold", lineheight=25,hjust=0.5)) |
| 174 | +clustplot1 |
| 175 | +# circular layout |
| 176 | +clustplot2 <- ggclust(obj=hcsample, |
| 177 | + layout = "circular", |
| 178 | + pointsize=1, |
| 179 | + fontsize=2, |
| 180 | + factorNames=c("time")) + |
| 181 | + scale_color_manual(values=c("#2874C5", "#EABF00")) + |
| 182 | + theme(legend.position="right") |
| 183 | +clustplot2 |
| 184 | +``` |
| 185 | + |
| 186 | +# 3. Need helps? |
| 187 | + |
| 188 | +If you have questions/issues, please visit [github issue tracker](https://github.com/YuLab-SMU/MicrobiotaProcess/issues). |
| 189 | + |
| 190 | +# 4. Session information |
| 191 | + |
| 192 | +Here is the output of sessionInfo() on the system on which this document was compiled: |
| 193 | + |
| 194 | +```{r, echo=FALSE} |
| 195 | +sessionInfo() |
| 196 | +``` |
| 197 | + |
| 198 | +# 5. References |
0 commit comments