MS²PIP is a tool to predict MS² signal peak intensities from peptide sequences. It employs the XGBoost machine learning algorithm and is written in Python.
You can install MS²PIP on your machine by following the instructions below or the extended install instructions. For a more user friendly experience, we created a web server . There, you can easily upload a list of peptide sequences, after which the corresponding predicted MS² spectra can be downloaded in multiple file formats. The web server can also be contacted through the RESTful API.
To generate a predicted spectral library starting from a FASTA file, we developed a pipeline called fasta2speclib. Usage of this pipeline is described in fasta2speclib_config.md. Fasta2speclib was developed in collaboration with the ProGenTomics group for the MS²PIP for DIA project.
If you use MS²PIP for your research, please cite the following articles:
- Gabriels, R., Martens, L., & Degroeve, S. (2019). Updated MS²PIP web server delivers fast and accurate MS² peak intensity prediction for multiple fragmentation methods, instruments and labeling techniques. Nucleic Acids Research https://doi.org/10.1093/nar/gkz299
- Degroeve, S., Maddelein, D., & Martens, L. (2015). MS²PIP prediction server: compute and visualize MS² peak intensity predictions for CID and HCD fragmentation. Nucleic Acids Research, 43(W1), W326–W330. https://doi.org/10.1093/nar/gkv542
- Degroeve, S., & Martens, L. (2013). MS²PIP: a tool for MS/MS peak intensity prediction. Bioinformatics (Oxford, England), 29(24), 3199–203. https://doi.org/10.1093/bioinformatics/btt544
Please also take note of and mention the MS²PIP-version and model-version you used.
Download the latest release
and unzip. MS2PIPc runs on Python 3.5 or greater and the required Python packages are listed
in requirements.txt. MS2PIPc requires machine specific compilation of the
C-code:
sh compile.sh
Check out the extended install instructions for a more detailed explanation.
MS2PIPc comes with pre-trained models for a variety of fragmentation methods and modifications. These models can easily be applied by configuring MS2PIPc in the config.txt file and providing a list of peptides in the form of a PEPREC file.
usage: ms2pipC.py [-h] [-c FILE] [-s FILE] [-w FILE] [-m INT] <peptide file>
positional arguments:
<peptide file> list of peptides
optional arguments:
-h, --help show this help message and exit
-c FILE config file (by default config.txt)
-s FILE .mgf MS2 spectrum file (optional)
-w FILE write feature vectors to FILE.{pkl,h5} (optional)
-m INT number of cpu's to use
Several MS2PIPc options need to be set in this config file.
The models that should be used are set as model=X where X is one of the
currently supported MS2PIP models (see MS2PIP Models).
The fragment ion error tolerance is set as frag_error=X where is X is
the tolerance in Da.
PTMs (see further) are set as ptm=X,Y,opt,Z for each internal PTM
where X is a string that represents the PTM, Y is the difference in Da
associated with the PTM, opt is a required for compatibility with
other CompOmics projects, and Z is the amino acid that is modified by the PTM.
For N- and C-terminal modifications, Z should be N-term or C-term,
respectively.
To apply the pre-trained models you need to pass only a <peptide file>
to ms2pipC.py. This file contains the peptide sequences for which you
want to predict the b- and y-ion peak intensities. The file is space
separated and contains four columns with the following header names:
spec_id: an id for the peptide/spectrummodifications: a string indicating the modified amino acidspeptide: the unmodified amino acid sequencecharge: charge state to predict
The spec_id column is a unique identifier for each peptide that will
be used in the TITLE field of the predicted MS2 .mgf file. The
modifications column is a string that lists the PTMs in the peptide.
Each PTM is written as A|B where A is the location of the PTM in the
peptide (the first amino acid has location 1, location 0 is used for
n-term modifications, while -1 is used for c-term modifications) and B
is a string that represent the PTM as defined in the config file (-c
command line argument). Multiple PTMs in the modifications column are
concatenated with '|'.
As an example, suppose the config file contains the line
ptm=Cam,57.02146,opt,C
ptm=Ace,42.010565,opt,N-term
ptm=Glyloss,-58.005479,opt,C-term
then a modifications string could like 0|Ace|2|Cam|5|Cam|-1|Glyloss
which means that the second and fifth amino acid is modified with Cam,
that there is an N-terminal modification Ace, and that there is a
C-terminal modification Glyloss.
In the conversion_tools folder, we provide a host of Python scripts
to convert common search engine output files to a PEPREC file.
The predictions are saved in a .csv file with the name
<peptide_file>_predictions.csv.
If you want the output to be in the form of an .mgf file, replace the
variable mgf in line 716 of ms2pipC.py.
Currently the following models are supported in MS²PIP:
HCD, CID, TTOF5600, TMT, iTRAQ,
iTRAQphospho, HCDch2 and CIDch2. The last two "ch2" models also include predictions for doubly charged fragment ions (b++ and y++), next to the predictions for singly charged b- and y-ions.
If you use MS²PIP for your research, always mention the MS²PIP-version (see releases page) and model-version (see table below) you used.
| Model | Current version | Train-test dataset (unique peptides) | Evaluation dataset (unique peptides) | Median Pearson correlation on evaluation dataset |
|---|---|---|---|---|
| HCD | v20190107 | MassIVE-KB (1 623 712) | PXD008034 (35 269) | 0.903786 |
| CID | v20190107 | NIST CID Human (340 356) | NIST CID Yeast (92 609) | 0.904947 |
| iTRAQ | v20190107 | NIST iTRAQ (704 041) | PXD001189 (41 502) | 0.905870 |
| iTRAQphospho | v20190107 | NIST iTRAQ phospho (183 383) | PXD001189 (9 088) | 0.843898 |
| TMT | v20190107 | Peng Lab TMT Spectral Library (1 185 547) | PXD009495 (36 137) | 0.950460 |
| TTOF5600 | v20190107 | PXD000954 (215 713) | PXD001587 (15 111) | 0.746823 |
| HCDch2 | v20190107 | MassIVE-KB (1 623 712) | PXD008034 (35 269) | 0.903786 (+) and 0.644162 (++) |
| CIDch2 | v20190107 | NIST CID Human (340 356) | NIST CID Yeast (92 609) | 0.904947 (+) and 0.813342 (++) |
For optimal results, your experimental data should match the properties of the MS²PIP model.
| Model | Fragmentation method | MS² mass analyzer | Peptide properties |
|---|---|---|---|
| HCD | HCD | Orbitrap | Tryptic digest |
| CID | CID | Linear ion trap | Tryptic digest |
| iTRAQ | HCD | Orbitrap | Tryptic digest, iTRAQ-labeled |
| iTRAQphospho | HCD | Orbitrap | Tryptic digest, iTRAQ-labeled, enriched for phosphorylation |
| TMT | HCD | Orbitrap | Tryptic digest, TMT-labeled |
| TTOF5600 | CID | Quadrupole Time-of-Flight | Tryptic digest |
| HCDch2 | HCD | Orbitrap | Tryptic digest |
| CIDch2 | CID | Linear ion trap | Tryptic digest |
To train custom MS2PIPc models, please refer to Training new MS2PIP models on our Wiki pages.