Improving Tandem Mass Spectrum Identification Using Peptide Retention Time Prediction across Diverse Chromatography Conditions

Aaron Klammer, Xianhua Yi, Michael J. MacCoss and William Stafford Noble

Proceedings of the International Conference on Research in Computational Biology (RECOMB), 2007
Analytical Chemistry. 79(16):6111-6118, 2007.


Most tandem mass spectrum identification algorithms use information only from the final spectrum, ignoring precursor information such as peptide retention time (RT). Efforts to exploit peptide RT for peptide identification can be frustrated by its variability across liquid chromatography analyses. We show that peptide RT can be reliably predicted by training a support vector regressor on a single chromatography run. This dynamically trained model outperforms a published statically trained model of peptide RT across diverse chromatography conditions. In addition, the model can be used to filter peptide identifications that produce large discrepancies between observed and predicted RT. After filtering, estimated true positive peptide identifications increase by as much as 50% at a false discovery rate of 3%, with the largest increase for non-specific cleavage with elastase.

  • Supplementary figures.

  • Supplementary data. SEQUEST results were produced by searching spectra using this parameter file against one of these sequence databases: target, decoy1, decoy2 or decoy3. The ms2 and sqt file formats are described in McDonald et al.(2004). The raw weights for the trained SVM are here.

  • The software used to generate the results in the paper.