(c) 2009 - 2012 The Authors, see LICENSE.txt for details.
Dent Earl, Benedict Paten, Mark Diekhans
The evolver team is responsible for items in external/ : George Asimenos and Robert C. Edgar, Serafim Batzoglou and Arend Sidow.
A jobTree based simulation manager for the Evolver genome evolution simulation tool suite.
evolverSimControl (eSC) can be used to simulate multi-chromosome genome evolution on an arbitrary phylogeny (Newick format). In addition to simply running evolver, eSC also automatically creates statistical summaries of the simulation as it runs including text and image files. Also included are convenience scripts to: check on a running simulation and see detailed status and logging information; extract fasta sequence files from the leaf nodes of a completed simulation; extract pairwise multiple alignment files (.maf) from leaf and branch nodes from a completed simulation and with the help of mafJoin, join them together into a single maf covering the entire simulation.
The use of jobTree means that you can run eSC on a cluster running a jobTree supported batch system, on a multi-cored server or on your laptop.
- sonLib: https://github.com/benedictpaten/sonLib/
- jobTree: https://github.com/benedictpaten/jobTree/
- evolver: http://www.drive5.com/evolver/ Specifically, make sure that the Evolver tools are on your
PATHenvironmental variable and that their names are preceeded withevolver_. Specifically all of the following list of files need to be on yourPATH.evolver_cvtevolver_evoevolver_transalign
- trf: http://tandem.bu.edu/trf/trf.html Tandem Repeats Finder.
- mafJoin: https://github.com/dentearl/mafTools Not necessary for simple simulations, mafJoin (part of mafTools) is only needed if you wish to create a maf alignment of all sequences following a simulation.
- R: http://cran.r-project.org/ Only necessary if you wish to use the
simCtrl_postSimAnnotDistExtractor.pyscript to view annotation size distributions following a simulation.
- Linux on i86 Intel. This is due to core Evolver executables being distributed as pre-compiled binaries.
- Download the package. Consider making it a sibling directory to
jobTree/andsonLib/. cdinto the directory.- Type
make. - Edit your
PYTHONPATHenvironmental variable to contain the parent directory of theevolverSimControl/directory. - Type
make test.
This example will work you through a small simulation using the toy test example available at http://soe.ucsc.edu/~dearl/software/evolverSimControl/. If you want to create your own infile you can use evolverInfileGeneration to generate your own infile set.
- Download and expand the toy archive. For simplicity I'll assume that both
root/andparams/are in the working directory, i.e../. - Next we run the runSim program:
$ simCtrl_runSim.py --inputNewick '(Knife:0.004, (Fork:0.003, (Ladle:0.002, (Spoon:0.001, Teaspoon:0.001)S-TS:.001)S-TS-L:.001)S-TS-L-F:0.001);' --outDir toyExampleSim --rootDir root/ --rootName hg18 --paramsDir params/ --jobTree jobTreeToyExampleSim --maxThreads 32 --seed 3571- You can check on a running simulation by using
simCtrl_checkSimStatus.py, use--helpfor options.
- Post simulation you can run
simCtrl_postSimFastaExtractor.pyto extract fasta sequence files from the genomes. - You may also wish to run
simCtrl_postSimAnnotDistExtractor.pywhich will use the ggplot2 package for R to display the length distributions of some of the annotations. - You may also wish to construct a single maf for the simulation using
simCtrl_postSimMafExtractor.pywhich will use mafJoin to join the pairwise maf output from Evolver into a single simulation wide maf. This process is extremely memory intensive with the 120Mb Mammal simulation eventually requiring aprroximately 250Gb of memory.
In order to run eSC you will need an infile set, a parameter set, a phylogenetic tree and optionally a mobile element library and mobile element parameter set. Infile sets can be created using evolverInfileGenerator or from scratch. Parameter sets can be generated by reading primary literature and coming up with reasonable values. Phylogenetic trees need to be in Newick format.
Available options for running a simulation are listed below.
$ bin/simCtrl_runSim.py --help
Usage: simCtrl_runSim.py --rootName=name --rootDir=/path/to/dir --paramsDir=/path/to/dir --tree=newickTree --stepLength=stepLength --outDir=/path/to/dir --jobTree=/path/to/dir [options]
simCtrl_runSim.py is used to initiate an evolver simulation using jobTree/scriptTree.
Options:
-h, --helpshow this help message and exit--rootDir=ROOTINPUTDIRInput root directory.--rootName=ROOTNAMEname of the root genome, to differentiate it from the input Newick. default=root--inputNewick=INPUTNEWICKNewick tree. http://evolution.genetics.washington.edu/phylip/newicktree.html--stepLength=STEPLENGTHstepLength for each cycle. default=0.001--paramsDir=PARAMSDIRParameter directory.--outDir=OUTDIROut directory.--seed=SEEDRandom seed, either an int or "stochastic". default=stochastic--noMEsTurns off all mobile element and RPG modules in the sim. default=False--noBurninMergeTurns off checks for an aln.rev file in the root dir. default=False--noGeneDeactivationTurns off the gene deactivation step. default=False--maxThreads=MAXTHREADSThe maximum number of threads to use when running in single machine mode. default=4- ... and all other jobTree standard options.
To check on a running simulation you can use the simCtrl_checkSimStatus.py script.
$ bin/simCtrl_checkSimStatus.py --help
Usage: simCtrl_checkSimStatus.py --simDir path/to/dir [options]
simCtrl_checkSimStatus.py can be used to check on the status of a running or completed evolverSimControl simulation.
Options:
-h, --helpshow this help message and exit--simDir=SIMDIRParent directory.--drawText, --drawTreeprints an ASCII representation of the current tree status. default=False--curCyclesprints out the list of currently running cycles. default=False--statsprints out the statistics for cycle steps. default=False--cycleStemprints out a stem and leaf plot for completed cycle runtimes, in seconds. default=False--cycleStemHoursprints out a stem and leaf plot for completed cycle runtimes, in hours. default=False--printChrTimesprints a table of chromosome lengths (bp) and times (sec) for intra chromosome evolution step (CycleStep2).--cycleListprints out a list of all completed cycle runtimes. default=False--htmlprints output in HTML format for use as a cgi. default=False--htmlDir=HTMLDIRprefix for html links.
To extract fasta sequences from a completed simulation you can use the simCtrl_postSimFastaExtractor.py script.
$ bin/simCtrl_postSimFastaExtractor.py --help
Usage: simCtrl_postSimFastaExtractor.py --simDir path/to/dir [options]
simCtrl_postSimFastaExtractor.py takes in a simulation directory and then extracts the sequences of leaf nodes in fasta format and stores them in the respective step's directory.
Options:
-h, --helpshow this help message and exit--simDir=SIMDIRthe simulation directory.--allCyclesextract fastas from all cycles, not just leafs. default=False
To create a single maf reflecting the evolutionary history of the entire simulation simCtrl_postSimFastaExtractor.py script.
$ bin/simCtrl_postSimMafExtractor.py --help
Usage: simCtrl_postSimMafExtractor.py --simDir path/to/dir [options]
simCtrl_postSimMafExtractor.py requires mafJoin which is part of mafTools and is available at https://github.com/dentearl/mafTools/ .
Options:
-h, --helpshow this help message and exit--simDir=SIMDIRSimulation directory.--maxBlkWidth=MAXBLKWIDTHMaximum mafJoin maf block output size. May be reduced towards 250 for complicated phylogenies. default=10000--maxInputBlkWidth=MAXINPUTBLKWIDTHMaximum mafJoin maf block input size. mafJoin will cut inputs to size, may result in long runs for very simple joins. May be reduced towards 250 for complicated phylogenies. default=1000--noBurninMergeWill not perform a final merge of simulation to the burnin. default=False--maxThreads=MAXTHREADSThe maximum number of threads to use when running in single machine mode. default=4- ... and all other jobTree standard options.