Statistical calibration of the SEQUEST XCorr function
Aaron A. Klammer, Christopher Y. Park and William Stafford Noble
Journal of Proteome Research. In press.
Obtaining accurate peptide identifications from shotgun proteomics liquid chromatography tandem mass spectrometry (LC-MS/MS) experiments requires a score function that consistently ranks correct peptide-spectrum matches (PSMs) above incorrect matches. We have observed that, for the SEQUEST score function Xcorr, the inability to discriminate between correct and incorrect PSMs is due in part to spectrum-specific properties of the score distribution. In other words, some spectra score well regardless of which peptides they are scored against, and other spectra score well because they are scored against a large number of peptides. We describe a protocol for calibrating PSM score functions, and we demonstrate its application to Xcorr and the preliminary SEQUEST score function Sp. The protocol accounts for spectrum- and peptide-specific effects by calculating p values for each spectrum individually, using only that spectrum's score distribution. We demonstrate that these calculated p values are uniform under a null distribution and therefore accurately measure significance. These p values can be used to estimate the false discovery rate, therefore eliminating the need for an extra search against a decoy database. In addition, we show that the p values are better calibrated than their underlying scores; consequently, when ranking top-scoring PSMs from multiple spectra, p values are better at discriminating between correct and incorrect PSMs. The calibration protocol is generally applicable to any PSM score function for which an appopriate parametric family can be identified.
- Supplementary figures.
- Supplementary data:
Target SEQUEST Decoy SEQUEST Target Crux Xcorr Decoy Crux Xcorr Target Crux Xcorr p-value Decoy Crux Xcorr p-value Target Crux Sp p-value Decoy Crux Sp p-value 60cm.ms2 12.8MB 11.5MB 2.1MB 2.2MB 1.1MB 1.1MB 1.0MB 1.0MB
SEQUESTresults were produced by searching these spectra using this parameter file against one of these sequence databases: target, decoy. The
sqtfile formats are described in McDonald et al. (2004).