Using Correlation Profiles of mutations to infer the recombination rate from large-scale sequencing data in bacteria.
- Install
gitfrom https://git-scm.com; - Install
gofrom https://golang.org/doc/install; - Install
python3from https://www.python.org/ (we found running issues using the default Python in MacOS); - Install
pip3from https://pip.pypa.io/en/stable/installing/.
- Install
mcorr-xmfa,mcorr-bam, andmcorr-fitfrom your terminal:
go get -u github.com/kussell-lab/mcorr/cmd/mcorr-xmfa
go get -u github.com/kussell-lab/mcorr/cmd/mcorr-bam
cd $HOME/go/src/github.com/kussell-lab/mcorr/cmd/mcorr-fit
python3 setup.py installor to install mcorr-fit in local directory (~/.local/bin in Linux or ~/Library/Python/3.6/bin in MacOS):
python3 setup.py install --user- Add
$HOME/go/binand$HOME/.local/binto your$PATHenvironment. In Linux, you can do it in your terminal:
export PATH=$PATH:$HOME/go/bin:$HOME/.local/binIn MacOS, you can do it as follows:
export PATH=$PATH:$HOME/go/bin:$HOME/Library/Python/3.6/binWe have tested installation in Windows 10, Ubuntu 17.10, and MacOS Big Sur (on both Intel and M1 chips), using Python 3 and Go 1.15 and 1.16.
Typical installation time on an iMac is 10 minutes.
The inference of recombination parameters requires two steps:
-
Calculate Correlation Profile
-
For whole-genome alignments (multiple gene alignments), use
mcorr-xmfa:mcorr-xmfa <input XMFA file> <output prefix>
The XMFA files should contain only coding sequences. The description of XMFA file can be found in http://darlinglab.org/mauve/user-guide/files.html. We provide two useful pipelines to generate whole-genome alignments:
- from multiple assemblies: https://github.com/kussell-lab/AssemblyAlignmentGenerator;
- from raw reads: https://github.com/kussell-lab/ReferenceAlignmentGenerator
-
For read alignments, use
mcorr-bam:mcorr-bam <GFF3 file> <sorted BAM file> <output prefix>
The GFF3 file is used for extracting the coding regions of the sorted BAM file.
-
For calculating correlation profiles between two clades or sequence clusters from whole-genome alignments, you can use
mcorr-xmfa-2clades:mcorr-xmfa-2clades <input XMFA file 1> <input XMFA file 2> <output prefix>
Where file 1 and file 2 are the multiple gene alignments for the two clades.
All programs will produce two files:
- a .csv file stores the calculated Correlation Profile, which will be used for fitting in the next step;
- a .json file stores the (intermediate) Correlation Profile for each gene.
-
-
Fit the Correlation Profile using
mcorr-fit:-
For fitting correlation profiles as described in the 2019 Nature Methods paper use
mcorr-fit:mcorr-fit <.csv file> <output_prefix>
It will produce four files:
<output_prefix>_best_fit.svgshows the plots of the Correlation Profile, fitting, and residuals;<output_prefix>_fit_reports.txtshows the summary of the fitted parameters;<output_prefix>_fit_results.csvshows the table of fitted parameters;<output_prefix>_lmfit_report.csvshows goodness of fit-statistics from LMFIT
-
To fit correlation profiles using the method from the Nature Methods paper and do model selection with AIC by comparing to the zero recombination case, use
mcorrFitCompare:mcorrFitCompare <.csv file> <output_prefix>
It will produce five files:
<output_prefix>_recombo_best_fit.svgand<output_prefix>_zero-recombo_best_fit.svgshow the plots of the Correlation Profile, fitting, and residuals for the model with recombination and for the zero recombination case;<output_prefix>_comparemodels.csvshows the table of fitted parameters and AIC values;<output_prefix>_recombo_residuals.csvand<output_prefix>_zero-recombo_residuals.csvincludes residuals for the model with recombination and the zero-recombination case
-