Skip to content

Commit

Permalink
Merge pull request #19 from sartorlab/v0.1.2-beta
Browse files Browse the repository at this point in the history
* Update VERSIONS
* Create project specific /tmp folder
* Improved makefile and config.mk documentation
* Fix --mfold parameter formatting for macs2
* Autofill for --gsize parameter when genome is hg19, hg38, mm9, or mm10
* Remove PBS script creation
* Add fold change plots to PePr and macs2
* Add multiqc output (split for bisulfite and pulldown)
* makefile reorganization and clarification
* Add group labels for comparisons to improve interpretability
  • Loading branch information
Raymond Cavalcante committed Jun 4, 2016
2 parents 54e44e9 + a41ec17 commit 9d0ee0b
Show file tree
Hide file tree
Showing 16 changed files with 953 additions and 438 deletions.
83 changes: 68 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,15 +61,36 @@ The `mint` pipeline is dependent on several software packages to carry out its a
* [`bedops` v2.4.14](https://github.com/bedops/bedops/releases/tag/v2.4.14)
* [`bismark` v0.16.1](https://github.com/FelixKrueger/Bismark/releases/tag/0.16.1)
* [`bowtie2` v2.2.4](https://github.com/BenLangmead/bowtie2/releases/tag/v2.2.4)
* [`cutadapt` v1.9.1](https://pypi.python.org/pypi/cutadapt/1.9.1)
* [`FastQC` v0.11.5](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v0.11.5_source.zip)
* [`macs2` v2.1.0.20140616](https://pypi.python.org/pypi/MACS2/2.1.0.20140616)
* [`PePr` v1.1.5](https://github.com/shawnzhangyx/PePr/releases/tag/1.1.5)
* [`multiqc` v0.6.0](https://github.com/ewels/MultiQC/releases/tag/v0.6)
* [`PePr` v1.1.10](https://github.com/shawnzhangyx/PePr/releases/tag/1.1.10)
* [`R` >= v3.2.5](https://cran.r-project.org)
* [`annotatr` v0.7.3](https://github.com/rcavalcante/annotatr/releases/tag/v0.7.3)
* `devtools`
* `dplyr`
* `ggplot2`
* [`methylSig` v0.4.3](https://github.com/sartorlab/methylSig/releases/tag/v0.4.3)
* `optparse`
* `readr`
* [`samtools` v0.1.19](https://github.com/samtools/samtools/releases/tag/0.1.19)
* [`trim_galore` v0.4.1](http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/trim_galore_v0.4.1.zip)
* [`cutadapt` v1.9.1](https://pypi.python.org/pypi/cutadapt/1.9.1)
* [`FastQC` v0.11.5](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v0.11.5_source.zip)

The following `R` code will install the necessary packages:

```{r}
# Install CRAN packages
install.packages(c('devtools','optparse','readr','dplyr','ggplot2'), repos='http://cran.rstudio.com')
# Install Bioconductor packages
source("https://bioconductor.org/biocLite.R")
biocLite(c("BiocStyle","GenomeInfoDb","IRanges","GenomicRanges"))
# Install GitHub packages
install_github('rcavalcante/[email protected]')
install_github('sartorlab/[email protected]')
```

### Getting `mint`

Expand Down Expand Up @@ -193,6 +214,8 @@ At minimum, paths to reference genome information must be provided in the `Genom
```{make}
# Configuration for mint pipeline analyses
# This makefile was generated using mint v0.1.2
################################################################################
# Project and experimental information
Expand Down Expand Up @@ -230,40 +253,65 @@ PATH_TO_BDG2BW := $(shell which bedGraphToBigWig)
PATH_TO_BDG2BB := $(shell which bedToBigBed)
################################################################################
# Command line options for tools
################################################################################
# bisulfite_align configuration options
# FastQC
OPTS_FASTQC = --format fastq --noextract
# trim_galore bisulfite
# For trim_galore parameters see http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/trim_galore_User_Guide_v0.4.1.pdf
OPTS_TRIMGALORE_BISULFITE = --quality 20 --illumina --stringency 6 -e 0.2 --gzip --length 25 --rrbs
# trim_galore pulldown
OPTS_TRIMGALORE_PULLDOWN = --quality 20 --illumina --stringency 6 -e 0.2 --gzip --length 25
# bismark
# For bismark parameters see http://www.bioinformatics.babraham.ac.uk/projects/bismark/Bismark_User_Guide_v0.15.0.pdf
OPTS_BISMARK = --bowtie2 $(GENOME_PATH)
# Command line option for minimum coverage required for bismark_methylation_extractor
# and scripts/classify_prepare_bisulfite_sample.awk in the sample classification module
# NOTE: This does not affect methylSig runs
OPT_MIN_COV = 5
# bismark_methylation_extractor
OPTS_EXTRACTOR = --single-end --gzip --bedGraph --cutoff 5 --cytosine_report --genome_folder $(GENOME_PATH) --multicore 1
# For methylation extractor parameters see http://www.bioinformatics.babraham.ac.uk/projects/bismark/Bismark_User_Guide_v0.15.0.pdf
OPTS_EXTRACTOR = --single-end --gzip --bedGraph --cutoff $(OPT_MIN_COV) --cytosine_report --genome_folder $(GENOME_PATH) --multicore 5
################################################################################
# pulldown_align configuration options
# trim_galore pulldown
# For trim_galore parameters see http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/trim_galore_User_Guide_v0.4.1.pdf
OPTS_TRIMGALORE_PULLDOWN = --quality 20 --illumina --stringency 6 -e 0.2 --gzip --length 25
# bowtie2
# For bowtie2 parameters see http://bowtie-bio.sourceforge.net/bowtie2/manual.shtml
OPTS_BOWTIE2 = -q -x $(BOWTIE2_GENOME_PATH) -U
################################################################################
# pulldown_sample configuration options
# macs2
OPTS_MACS = -t $bowtie2Bam -c $bowtie2InputBam -f BAM -g hs --outdir ./analysis/macs_peaks -n $macsPrefix
# For documentation about parameters see https://github.com/taoliu/MACS
OPTS_MACS = --gsize hs --qvalue 0.01 --mfold 5,50
################################################################################
# Command line options for methylSig and compare classifications
# bisulfite_compare configuration options
# DMC for CpG resolution, and DMR for region resolution (window size parameter
# used in the methylSig options below).
OPT_DM_TYPE = DMR
# Thresholds to use for DMCs or DMRs (above) in methylSig
# Thresholds to use for DMCs or DMRs (above) methylSig output
# FDR significance level
OPT_MSIG_DM_FDR_THRESHOLD = 0.05
# Desired absolute value of methylation difference
OPT_MSIG_DM_DIFF_THRESHOLD = 10
################################################################################
# Comparison specific options
# See ?methylSig::methylSigReadData and ?methylSig::methylSigCalc after installing methylSig in R for parameter information
OPTS_METHYLSIG_IDH2mut_v_NBM_mc_hmc_bisulfite = --context CpG --resolution base --destranded TRUE --maxcount 500 --mincount 5 --filterSNPs TRUE --dmtype $(OPT_DM_TYPE) --winsize.tile 50 --dispersion both --local.disp FALSE --winsize.disp 200 --local.meth FALSE --winsize.meth 200 --minpergroup 2,2 --T.approx TRUE --ncores 4 --quiet FALSE
OPTS_PEPR_IDH2mut_v_NBM_hmc_pulldown = --file-format=bam --peaktype=sharp --diff --threshold=1e-05 --num-processors=1
################################################################################
# pulldown_compare configuration options
# For PePr parameters see https://ones.ccmb.med.umich.edu/wiki/PePr/
OPTS_PEPR_IDH2mut_v_NBM_hmc_pulldown = --file-format=bam --peaktype=sharp --diff --threshold=1e-05 --num-processors=8
```

#### Running a project
Expand Down Expand Up @@ -294,6 +342,8 @@ make sample_classification
make compare_classification
```

To see what will be run by the pipeline without *actually* running anything, you can `make -n pulldown_align`, etc. for each of the commands.

Depending on the computing hardware used, projects can be run with the `make -j n` command where `n` is a positive integer. The `-j` flag specifies how many commands `make` is allowed to run simultaneously. When it is not present, the default is to run commands in serial.

**NOTE:** Some software in the `mint` pipeline have options for the number of processors to use, so some care should be taken not to exceed the computing limitations of the hardware. For example, `bismark` tends to use 5 cores per instantiation, so using `make -j 5` would really use 25 cores.
Expand Down Expand Up @@ -333,6 +383,7 @@ test_hybrid/pulldown/raw_fastqs
test_hybrid/pulldown/pepr_peaks
test_hybrid/pulldown/macs2_peaks
test_hybrid/pulldown/pulldown_coverages
test_hybrid/tmp
test_hybrid/data
test_hybrid/data/test_hybrid_annotation.txt
test_hybrid/data/raw_fastqs
Expand Down Expand Up @@ -542,6 +593,8 @@ Additionally, a variety of plots are output to help interpret the output of `bis

![Volcano plots from methylSig results](https://github.com/sartorlab/mint/blob/master/docs/IDH2mut_v_NBM_mc_hmc_bisulfite_DMR_methylSig_volcano.png)

#### Plot: Distribution of methylSig calls in annotations

![Distribution of methylSig calls in annots](https://github.com/sartorlab/mint/blob/master/docs/IDH2mut_v_NBM_mc_hmc_bisulfite_DMR_methylSig_DMstatus_prop_genes.png)

#### Plot: Distribution of chip1/chip2 peaks in annotations
Expand Down
27 changes: 19 additions & 8 deletions VERSIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,22 +6,33 @@ This file gives the current version of software used in the pipeline as of May 1
* [`bedops` v2.4.14](https://github.com/bedops/bedops/releases/tag/v2.4.14)
* [`bismark` v0.16.1](https://github.com/FelixKrueger/Bismark/releases/tag/0.16.1)
* [`bowtie2` v2.2.4](https://github.com/BenLangmead/bowtie2/releases/tag/v2.2.4)
* [`cutadapt` v1.9.1](https://pypi.python.org/pypi/cutadapt/1.9.1)
* [`FastQC` v0.11.5](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v0.11.5_source.zip)
* [`macs2` v2.1.0.20140616](https://pypi.python.org/pypi/MACS2/2.1.0.20140616)
* [`PePr` v1.1.5](https://github.com/shawnzhangyx/PePr/releases/tag/1.1.5)
* R >= v3.2.5
* [`multiqc` v0.6.0](https://github.com/ewels/MultiQC/releases/tag/v0.6)
* [`PePr` v1.1.10](https://github.com/shawnzhangyx/PePr/releases/tag/1.1.10)
* [`R` >= v3.2.5](https://cran.r-project.org)
* [`annotatr` v0.7.3](https://github.com/rcavalcante/annotatr/releases/tag/v0.7.3)
* `devtools`
* `dplyr`
* `ggplot2`
* [`methylSig` v0.4.3](https://github.com/sartorlab/methylSig/releases/tag/v0.4.3)
* `optparse`
* `readr`
* [`samtools` v0.1.19](https://github.com/samtools/samtools/releases/tag/0.1.19)
* [`trim_galore` v0.4.1](http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/trim_galore_v0.4.1.zip)
* [`cutadapt` v1.9.1](https://pypi.python.org/pypi/cutadapt/1.9.1)
* [`FastQC` v0.11.5](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v0.11.5_source.zip)

The following R command will install the required R packages (assuming fresh R install):

```{r}
install.packages(c('readr','optparse','ggplot2','dplyr','devtools'), repos='http://cran.rstudio.com')
source("http://bioconductor.org/biocLite.R")
# Install CRAN packages
install.packages(c('devtools','optparse','readr','dplyr','ggplot2'), repos='http://cran.rstudio.com')
# Install Bioconductor packages
source("https://bioconductor.org/biocLite.R")
biocLite(c("BiocStyle","GenomeInfoDb","IRanges","GenomicRanges"))
devtools::install_github('sartorlab/methylSig')
devtools::install_github('rcavalcante/annotatr')
# Install GitHub packages
install_github('rcavalcante/[email protected]')
install_github('sartorlab/[email protected]')
```
28 changes: 22 additions & 6 deletions scripts/annotatr_bisulfite.R
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,25 @@ library(optparse)

option_list = list(
make_option('--file', type='character'),
make_option('--genome', type='character')
make_option('--genome', type='character'),
make_option('--group1', type='character'),
make_option('--group0', type='character')
)
opt = parse_args(OptionParser(option_list=option_list))

file = opt$file
genome = opt$genome
if(grepl('_trimmed_bismark_bt2.CpG_report_for_annotatr.txt', file)) {
sample = gsub('_trimmed_bismark_bt2.CpG_report_for_annotatr.txt','', basename(file))

# If the group names are not NULL, use them
if(!is.null(opt$group1)) {
group1 = opt$group1
}
if(!is.null(opt$group0)) {
group0 = opt$group0
}

if(grepl('_bismark_bt2.CpG_report_for_annotatr.txt', file)) {
sample = gsub('_bismark_bt2.CpG_report_for_annotatr.txt','', basename(file))
suffix = 'bismark'
} else if (grepl('_methylSig_for_annotatr.txt', file)) {
sample = gsub('_methylSig_for_annotatr.txt','', basename(file))
Expand Down Expand Up @@ -106,7 +117,7 @@ if(genome %in% c('hg19','hg38','mm9','mm10')) {
}
}

cats_order = c('hyper','hypo','noDM')
cats_order = c(group1,group0,'noDM')

###############################################################
# Annotate regions
Expand Down Expand Up @@ -157,6 +168,11 @@ ggplot2::ggsave(filename = counts_png, plot = plot_counts, width = 8, height = 8
# ggplot2::ggsave(filename = cocounts_png, plot = plot_cocounts, width = 8, height = 8)

if(suffix == 'bismark') {
# NOTE: Will receive a warning like:
# Warning messages:
# 1: Removed 578 rows containing non-finite values (stat_bin).
# 2: Removed 6936 rows containing non-finite values (stat_bin).
# When there are coverages exceeding the x limits
coverage_png = sprintf('summary/figures/%s_%s_coverage.png', sample, suffix)
plot_coverage = plot_numerical(
tbl = ar,
Expand Down Expand Up @@ -190,7 +206,7 @@ if(suffix == 'methylSig') {
facet_order = a_all_order,
bin_width = 5,
plot_title = sprintf('%s methylation difference over annotations', sample),
x_label = 'Methylation Difference (Group1 - Group0)')
x_label = sprintf('Methylation Difference (%s - %s)', group1, group0))
ggplot2::ggsave(filename = methdiff_png, plot = plot_methdiff, width = 8, height = 8)

volcano_png = sprintf('summary/figures/%s_%s_volcano.png', sample, suffix)
Expand All @@ -201,7 +217,7 @@ if(suffix == 'methylSig') {
facet = 'annot_type',
facet_order = a_all_order,
plot_title = sprintf('%s meth. diff. vs -log10(pval)', sample),
x_label = 'Methylation Difference (Group1 - Group0)',
x_label = sprintf('Methylation Difference (%s - %s)', group1, group0),
y_label = '-log10(pvalue)')
ggplot2::ggsave(filename = volcano_png, plot = plot_volcano, width = 8, height = 8)

Expand Down
Loading

0 comments on commit 9d0ee0b

Please sign in to comment.