Skip to content

kasmiyassin/tormes

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

151 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Anaconda-Server Badge
Anaconda-Server Badge

TORMES

An automated pipeline for whole bacterial genome analysis directly from raw Illumina paired-end sequencing data.
(repository under development)

Contents


What is TORMES?

TORMES is an open-source, user-friendly pipeline for whole bacterial genome sequencing (WGS) analysis directly from raw Illumina paired-end sequenceing data. TORMES work with every bacterial WGS dataset, regardless the number, origin or species. By following very simple instructions, TORMES automates all steps included in a typical WGS analysis, including:

  1. Sequence quality filtering
  2. De novo genome assembly
  3. Draft genome ordering against a reference (optional)
  4. Genome annotation
  5. Multi-Locus Sequence Typing (MLST, optional)
  6. Antibiotic resistance genes screening
  7. Virulence genes screening
  8. Pangenome comparison (optional)

When working with Escherichia or Salmonella sequence data, extensive analysis can be enabled (by using the -g/--genera option), including:

  1. Antibiotic resistance screening based on point mutations
  2. Plasmid replicons screening
  3. Serotyping
  4. fimH-Typing (only for Escherichia)

Once the WGS analysis is ended, TORMES summarizes the results in an interactive web-like file that can be opened in any web browser, making the results easy to analyze, compare and share.


Installation

TORMES is a pipeline that requries a lot of dependenices to work. It has been devised to be used as a conda environment. For installing TORMES an all its dependencies run:

wget https://anaconda.org/nmquijada/tormes-1.0/2019.04.25.180147/download/tormes-1.0.yml
conda env create -n tormes-1.0 --file tormes-1.0.yml

To activate TORMES environment run:

conda activate tormes-1.0

Additionally, the first time you are using TORMES, run (after activating TORMES environment):

tormes-setup

This step will install additional dependencies not available in conda, including the MiniKrakenDB_8GB required by Kraken to work (the database size is ~8GB and might take some time to download). Additionally it will automatically create the config_file.txt required for TORMES to work (see below).


Required dependencies

TORMES is a pipeline and it requires several dependencies to work:

Additional software when working with -g/--genera Escherichia.

Additional software when working with -g/--genera Salmonella.

TORMES will look to the software included in the config_file.txt, which is a simple tab-separated text file indicating the software/database and its location. An automatic config_file.txt will be created after running tormes-setup command. However, you can change the PATH to each software if other software version would like to be used (if you do so, respect software names and tab-separation).
You can find an example of the config_file.txt here.


Usage

Usage: tormes [options]

OBLIGATORY OPTIONS:
        -m/--metadata   Path to the file with the metadata regarding the samples
                        The file must have an specific organization for the program to work
                        If you don't have any or you would like to have an example or extra information,
                        please type:
                        tormes example-metadata
        -o/--output     Path and name of the output directory

OTHER OPTIONS:
        -a/--adapter    Path to the adapters file
                        (default="PATH/TO/TORMES/files/adapters.fa")
        --assembler     Select the assembler to use. Options available: 'spades', 'megahit'
                        (default='spades')
        -c/--config     Path to the configuration file with the location of all dependencies
                        (default="PATH/TO/TORMES/files/config_file.txt")
        --citation      Show citation
        --fast          Faster analysis (default='0')
                        ('megahit' is used as assembler and contig ordering and pangenome analysis are disabled)
        --filtering     Select the software for filtering the reads.
                        Options available: 'prinseq', 'sickle', 'trimmomatic'
                        (default="prinseq")
        -g/--genera     Type genera name to allow special analysis (default='none')
                        Options available: 'Escherichia', 'Salmonella'
        -h/--help       Show this help
        --min_len       Minimum length to the reads to survive after filtering (default=125) <integer>
        --no_mlst       Disable MLST analysis (default='0')
        --no_pangenome  Disable pangenome analysis (default='0')
        -q/--quality    Minimum mean phred score of the reads to survive after filtering (default=25) <integer>
        -r/--reference  Type path to reference genome (fasta, gbk) (default='none')
                        Reference will be used for contig ordering of the draft genome
        -t/--threads    Number of threads to use (default=1) <integer>
        --title         Path to a file containing the title in the project that will be used as title in the report
                        Avoid using special characters. TORMES will perform a default title if this option is not used
        -v/--version    Show version


Example:

tormes --metadata salmonella_metadata.txt --output Salmonella_TORMES_2018 --reference S_enterica-CT02021853.fasta --threads 32 --genera Salmonella

Obligatory options

A metadata text file is needed for TORMES to work by using the -m/--metadata option. This file will include all the information regarding the sample and requires an specific organization:

  • Columns should be tab separated.
  • First column must me called Samples and harbor samples names (avoid special characters).
  • Second column must be called Read1 and harbor the path to the R1 (forward) reads (either fastq or fastq.gz).
  • Third column must be called Read2 and harbor the path to the R2 (reverse) reads (either fastq or fastq.gz).
  • Fourth (and so on) columns are descriptive. The information included here is not needed for TORMES to work but will be included in the interactive report. You can add as many description columns as needed (including information such as isolation date or source, different codification of each sample, etc.).

This is an example of how the metadata file should looks like:

Samples Read1 Read2 Description1 Description2
Sample1 Forward read location Reverse read location Description 1 of Sample 1 Description 2 of Sample 1
Sample2 Forward read location Reverse read location Description 1 of Sample 2 Description 2 of Sample 2

If problems are encountered when performing the metadata file, you can generate a template metadata file by typing: tormes example-metadata.
This command will generate a file called samples_metadata.txt in your working directory that can be used as a template for your own dataset.


Output

TORMES stores every file generated during the analysis is different directories regarding the step within the analysis (assembly, annotation, etc.), all of them included within the main output directory specified with the -o/--output option:

  • annotation: one directory per sample containing all the annotation files generated by Prokka.
  • antibiotic_resistance_genes: results of the scrrening for antibiotic resistance genes by using Abricate against three databases: ARG-ANNOT, CARD and ResFinder.
  • assembly: files resulting from genome assembly with SPAdes or Megahit (in gzipped directories, to unzip them type tar xzf file-name.tgz) and the assembly stats generated with Quast.
  • cleaned_reads: reads that survived after quality filtering using Prinseq, Trimmomatic or Sickle.
  • draft_genomes: stores the draft genomes. If the -r/--reference option is used, draft genomes will be ordered against a reference by using Mauve and stored here. Contigs < 200 bp are removed.
  • mlst: results of Multi-Locus Sequence Typing (MLST) by using mlst.
  • pangenome: results of pangenome comparison based on the presence/absence of genes between the samples by using Roary.
  • report_files.tgz: files necessary for the generation of the interactive web-like report. See further instructions here.
  • sequencing_assembly_report.txt: tabulated file including information of the sequencing (number of reads, average read length, sequencing depth), the assembly (number of contigs, genome length, average contig length, N50, GC content) and consensus taxonomic assignment.
  • species_identification: consensus taxonomic assignment of each sample by using Kraken.
  • tormes.log: log file of TORMES analysis progress.
  • tormes_report.html: web-interactive report generated automatically after WGS analysis that summarizes the results. Can be open in any browser, shared and analyzed in a simple way.
  • virulence_genes: results of the scrrening for virulence genes by using Abricate against the Virulence Factors Database.

Once the WGS analysis is ended, TORMES summarizes the results in a interactive web-like report file. An example of a report file can be visualized here.
For the generation of the report file, tormes calls tormes-report (included in the TORMES pipeline) that generates a rmarkdown file (in R environment), called tormes_report.Rmd, that can be modified by the user for the generation of customized reports without the need of re-running the entire analysis.


Citation

Please cite the following pubication if you are using TORMES:


Narciso M. Quijada, David Rodríguez-Lázaro, Jose María Eiros and Marta Hernández (2019); TORMES: an automated pipeline for whole bacterial genome analysis; Bioinformatics, https://doi.org/10.1093/bioinformatics/btz220


The dependencies described in this section are the backbone of TORMES, and users are encouraged to cite them when using TORMES.


License

TORMES is a free software, licensed under GPLv3.

About

A simple non-stop pipeline for whole bacterial genome analysis directly from raw Illumina paired-end sequencing data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Shell 100.0%