This pipeline has two workflows. One workflow a taxonomic tree with a phylogenetic profile next to it. It uses gene names (can be changed in the config file) to look into the PATRIC database. With this intormation it takes all the entries that contain the gene name and makes a tree with this. Than it looks at which species have the input gene and which don't. Than it can make a phylogenetic profile. This workflow is named USE_PATRIC. The second workflow makes a taxonomic tree and a phylogenetic tree of the input gene(s). It also has to option to make a taxonomic tree with a phylogenetic profile if more than one gene is in the input file. This workflow is named USE_BLAST
Author: Aldo Vree Date last updated: 16 of June 2021
- Python 3.6
- Snakemake 5.10
- R 4.0.3
- ETE3
The pipeline runs in Snakemake so if you want to run it, just type 'snakemake' in the commandline:
snakemake # Make sure you are in the directory with the Snakefile.Note: The first time you run the USE_PATRIC workflow it will take a really long time (up to 10 hours) because the whole PATRIC database has to be downloaded. If you are on the alive server of the UU, the PATRIC database is on the server and than it takes about 10 min to finish the run.
In the config file are all the variables that can be changed. For the USE_PATRIC workflow the names of the searchgenes can be changed in this file. Also the columns in which the program searches can be changed here.
In the directory with all the scripts is also a file named config.ini. In this file all the variables that can be changed are stated. If you want to change the variable you must only change the part after the = sign. For example:
searchword = transduction #Only change transduction into what you want.
For the USE_BLAST workflow one adjustment needs to be made in a script before running it. In the Snakefile line 11; "expand("tree_total_tax.png")" need to be commented out if you run it for the first time. The workflow does not make the total taxonomic tree with the phylogenetic profile. When you want the taxonomic tree with the phylogenetic profile to be made, "expand("tree_total_tax.png")" in line 11 needs to be uncommented. Also a file needs to be made named 'total_tax_genes.txt'. This file must contain the output file '{genename}_gene_patric.txt' of all the genes that you want in the total taxonomic tree. So, if you want to put 3 genes in the tree, you have to make the file manualy by putting '{genename_1}_gene_patric.txt', '{genename_2}_gene_patric.txt' and '{genename_3}_gene_patric.txt' in the 'total_tax_genes.txt' file. The commands that can be used:
cat {genename_1}_gene_patric.txt >> 'total_tax_genes.txt'
cat {genename_2}_gene_patric.txt >> 'total_tax_genes.txt'
cat {genename_3}_gene_patric.txt >> 'total_tax_genes.txt'For the USE_BLAST workflow it is nessesery to use a conda environment if blast10 is not installed. To make the environment use the following commands:
conda create -n blast10 -c bioconda blast=2.10.1To activate the conda environment use the following command:
conda activate blast10To deactivate the conda environment use the following command:
conda activateIQtree must be installed on the conda environment aswell. Use the following command to do that:
conda install iqtreeHow the needed dependencies are installed is explaind in this section.
To install snakemake run the following commands in the commandline. (miniconda must be installed for this)
conda install -c conda-forge mamba
mamba create -c conda-forge -c bioconda -n snakemake snakemake
conda activate snakemakeTo install ete3 run the following commands in the commandline. (miniconda must be installed for this)
conda install -c etetoolkit ete3 ete_toolchain
# To check if it is installed:
ete3 build check