Skip to content

katarinaelez/protein-ss-pred

Folders and files

NameName
Last commit message
Last commit date

Latest commit

bf648d9 · Jan 28, 2019

History

17 Commits
Jan 24, 2019
Jan 26, 2019
Jan 26, 2019
Jan 28, 2019
Jan 24, 2019
Oct 12, 2018
Jan 28, 2019

Repository files navigation

protein-ss-pred

GOR method and an SVM-based method for protein secondary structure prediction.
Detailed description of the methods and datasets can be found in the project report.

Getting started

Requirements

The implementations require numpy and scikit-learn packages.

pip install numpy
pip install scikit-learn

In order to play with the notebooks make sure to install Jupyter Notebook.

pip install notebook

Installation

git clone https://github.com/katarinaelez/protein-ss-pred

Usage

GOR

Pretrained model (model.npz) is available.
Prediction from the GOR method can be obtained using:

python gor-predict.py [-h] (--pssm PSSM | --fasta FASTA) filename_model

For example:

python src/gor-predict.py --pssm data/blindTest/pssm/4S1H\:A.pssm models/model.npz

SVM

Pretrained model (model.sav.tar.gz) is available.
Before it can be used it must be extracted in the following way:

tar -xzvf models/model.sav.tar.gz -C models/

Prediction from the SVM-based method can be obtained using:

python svm-predict.py [-h] (--pssm PSSM | --fasta FASTA) [--probs] filename_model

For example:

python src/svm-predict.py --pssm data/blindTest/pssm/4S1H\:A.pssm models/model.sav

Training

GOR

GOR model can be trained using:

python gor-train.py [-h] [--filename_model FILENAME_MODEL]
                    [--window_size WINDOW_SIZE]
                    filename_id_list dir_pssm dir_dssp

For example:

python src/gor-train.py data/training/list.txt data/training/pssm data/training/dssp --filename_model models/model

SVM

SVM model can be trained using:

python svm-train.py [-h] [--filename_model FILENAME_MODEL]
                    [--window_size WINDOW_SIZE]
                    filename_id_list dir_pssm dir_dssp

For example:

python src/svm-train.py data/training/list.txt data/training/pssm data/training/dssp --filename_model models/model

Performance

GOR SVM
CV Blind test CV Blind test
SEN_H 0.86±0.01 0.83 0.80±0.01 0.72
SEN_E 0.62±0.01 0.60 0.58±0.01 0.62
SEN_C 0.42±0.01 0.42 0.82±0.00 0.85
PPV_H 0.58±0.01 0.60 0.82±0.01 0.85
PPV_E 0.54±0.01 0.58 0.75±0.01 0.80
PPV_C 0.80±0.01 0.73 0.72±0.00 0.65
MCC_H 0.50±0.01 0.46 0.71±0.01 0.67
MCC_E 0.45±0.01 0.46 0.58±0.01 0.63
MCC_C 0.40±0.01 0.39 0.58±0.01 0.56
SOV_H 65.48±0.99 62.70 76.39±0.97 68.64
SOV_E 58.64±1.35 63.18 59.20±2.18 67.19
SOV_C 43.09±0.69 45.57 70.18±0.93 70.93
ACC 0.62±0.00 0.62 0.76±0.00 0.75

License

MIT @ Katarina Elez

Releases

No releases published

Packages

No packages published