An automated pipeline for whole bacterial genome analysis directly from raw Illumina paired-end sequencing data.
(repository under development)
TORMES is an open-source, user-friendly pipeline for whole bacterial genome sequencing (WGS) analysis directly from raw Illumina paired-end sequenceing data. TORMES work with every bacterial WGS dataset, regardless the number, origin or species. By following very simple instructions, TORMES automates all steps included in a typical WGS analysis, including:
- Sequence quality filtering
- De novo genome assembly
- Draft genome ordering against a reference (optional)
- Genome annotation
- Multi-Locus Sequence Typing (MLST, optional)
- Antibiotic resistance genes screening
- Virulence genes screening
- Pangenome comparison (optional)
When working with Escherichia or Salmonella sequence data, extensive analysis can be enabled (by using the -g/--genera option), including:
- Antibiotic resistance screening based on point mutations
- Plasmid replicons screening
- Serotyping
- fimH-Typing (only for Escherichia)
Once the WGS analysis is ended, TORMES summarizes the results in an interactive web-like file that can be opened in any web browser, making the results easy to analyze, compare and share.
TORMES is a pipeline that requries a lot of dependenices to work. It has been devised to be used as a conda environment. For installing TORMES an all its dependencies run:
wget https://anaconda.org/nmquijada/tormes-1.0/2019.04.25.180147/download/tormes-1.0.yml
conda env create -n tormes-1.0 --file tormes-1.0.yml
To activate TORMES environment run:
conda activate tormes-1.0
Additionally, the first time you are using TORMES, run (after activating TORMES environment):
tormes-setup
This step will install additional dependencies not available in conda, including the MiniKrakenDB_8GB required by Kraken to work (the database size is ~8GB and might take some time to download). Additionally it will automatically create the config_file.txt required for TORMES to work (see below).
TORMES is a pipeline and it requires several dependencies to work:
- ABRicate
- FastTree
- GNUParallel
- ImageMagick
- Kraken
- Megahit
- mlst
- Prinseq
- progrressiveMauve
- Prokka
- Quast
- R
- Roary
- roary2svg.pl
- Sickle
- SPAdes
- Trimmomatic
Additional software when working with -g/--genera Escherichia.
Additional software when working with -g/--genera Salmonella.
TORMES will look to the software included in the config_file.txt, which is a simple tab-separated text file indicating the software/database and its location. An automatic config_file.txt will be created after running tormes-setup command. However, you can change the PATH to each software if other software version would like to be used (if you do so, respect software names and tab-separation).
You can find an example of the config_file.txt here.
Usage: tormes [options]
OBLIGATORY OPTIONS:
-m/--metadata Path to the file with the metadata regarding the samples
The file must have an specific organization for the program to work
If you don't have any or you would like to have an example or extra information,
please type:
tormes example-metadata
-o/--output Path and name of the output directory
OTHER OPTIONS:
-a/--adapter Path to the adapters file
(default="PATH/TO/TORMES/files/adapters.fa")
--assembler Select the assembler to use. Options available: 'spades', 'megahit'
(default='spades')
-c/--config Path to the configuration file with the location of all dependencies
(default="PATH/TO/TORMES/files/config_file.txt")
--citation Show citation
--fast Faster analysis (default='0')
('megahit' is used as assembler and contig ordering and pangenome analysis are disabled)
--filtering Select the software for filtering the reads.
Options available: 'prinseq', 'sickle', 'trimmomatic'
(default="prinseq")
-g/--genera Type genera name to allow special analysis (default='none')
Options available: 'Escherichia', 'Salmonella'
-h/--help Show this help
--min_len Minimum length to the reads to survive after filtering (default=125) <integer>
--no_mlst Disable MLST analysis (default='0')
--no_pangenome Disable pangenome analysis (default='0')
-q/--quality Minimum mean phred score of the reads to survive after filtering (default=25) <integer>
-r/--reference Type path to reference genome (fasta, gbk) (default='none')
Reference will be used for contig ordering of the draft genome
-t/--threads Number of threads to use (default=1) <integer>
--title Path to a file containing the title in the project that will be used as title in the report
Avoid using special characters. TORMES will perform a default title if this option is not used
-v/--version Show version
Example:
tormes --metadata salmonella_metadata.txt --output Salmonella_TORMES_2018 --reference S_enterica-CT02021853.fasta --threads 32 --genera Salmonella
A metadata text file is needed for TORMES to work by using the -m/--metadata option. This file will include all the information regarding the sample and requires an specific organization:
- Columns should be tab separated.
- First column must me called
Samplesand harbor samples names (avoid special characters). - Second column must be called
Read1and harbor the path to the R1 (forward) reads (either fastq or fastq.gz). - Third column must be called
Read2and harbor the path to the R2 (reverse) reads (either fastq or fastq.gz). - Fourth (and so on) columns are descriptive. The information included here is not needed for TORMES to work but will be included in the interactive report. You can add as many description columns as needed (including information such as isolation date or source, different codification of each sample, etc.).
This is an example of how the metadata file should looks like:
| Samples | Read1 | Read2 | Description1 | Description2 |
|---|---|---|---|---|
| Sample1 | Forward read location | Reverse read location | Description 1 of Sample 1 | Description 2 of Sample 1 |
| Sample2 | Forward read location | Reverse read location | Description 1 of Sample 2 | Description 2 of Sample 2 |
If problems are encountered when performing the metadata file, you can generate a template metadata file by typing: tormes example-metadata.
This command will generate a file called samples_metadata.txt in your working directory that can be used as a template for your own dataset.
TORMES stores every file generated during the analysis is different directories regarding the step within the analysis (assembly, annotation, etc.), all of them included within the main output directory specified with the -o/--output option:
- annotation: one directory per sample containing all the annotation files generated by Prokka.
- antibiotic_resistance_genes: results of the scrrening for antibiotic resistance genes by using Abricate against three databases: ARG-ANNOT, CARD and ResFinder.
- assembly: files resulting from genome assembly with SPAdes or Megahit (in gzipped directories, to unzip them type
tar xzf file-name.tgz) and the assembly stats generated with Quast. - cleaned_reads: reads that survived after quality filtering using Prinseq, Trimmomatic or Sickle.
- draft_genomes: stores the draft genomes. If the
-r/--referenceoption is used, draft genomes will be ordered against a reference by using Mauve and stored here. Contigs < 200 bp are removed. - mlst: results of Multi-Locus Sequence Typing (MLST) by using mlst.
- pangenome: results of pangenome comparison based on the presence/absence of genes between the samples by using Roary.
- report_files.tgz: files necessary for the generation of the interactive web-like report. See further instructions here.
- sequencing_assembly_report.txt: tabulated file including information of the sequencing (number of reads, average read length, sequencing depth), the assembly (number of contigs, genome length, average contig length, N50, GC content) and consensus taxonomic assignment.
- species_identification: consensus taxonomic assignment of each sample by using Kraken.
- tormes.log: log file of TORMES analysis progress.
- tormes_report.html: web-interactive report generated automatically after WGS analysis that summarizes the results. Can be open in any browser, shared and analyzed in a simple way.
- virulence_genes: results of the scrrening for virulence genes by using Abricate against the Virulence Factors Database.
Once the WGS analysis is ended, TORMES summarizes the results in a interactive web-like report file. An example of a report file can be visualized here.
For the generation of the report file, tormes calls tormes-report (included in the TORMES pipeline) that generates a rmarkdown file (in R environment), called tormes_report.Rmd, that can be modified by the user for the generation of customized reports without the need of re-running the entire analysis.
Please cite the following pubication if you are using TORMES:
Narciso M. Quijada, David Rodríguez-Lázaro, Jose María Eiros and Marta Hernández (2019); TORMES: an automated pipeline for whole bacterial genome analysis; Bioinformatics, https://doi.org/10.1093/bioinformatics/btz220
The dependencies described in this section are the backbone of TORMES, and users are encouraged to cite them when using TORMES.
TORMES is a free software, licensed under GPLv3.