Statistical calibration of the SEQUEST XCorr function

Statistical calibration of the SEQUEST XCorr function

Aaron A. Klammer, Christopher Y. Park and William Stafford Noble

Journal of Proteome Research. In press.

Abstract

Obtaining accurate peptide identifications from shotgun proteomics liquid chromatography tandem mass spectrometry (LC-MS/MS) experiments requires a score function that consistently ranks correct peptide-spectrum matches (PSMs) above incorrect matches. We have observed that, for the SEQUEST score function Xcorr, the inability to discriminate between correct and incorrect PSMs is due in part to spectrum-specific properties of the score distribution. In other words, some spectra score well regardless of which peptides they are scored against, and other spectra score well because they are scored against a large number of peptides. We describe a protocol for calibrating PSM score functions, and we demonstrate its application to Xcorr and the preliminary SEQUEST score function Sp. The protocol accounts for spectrum- and peptide-specific effects by calculating p values for each spectrum individually, using only that spectrum's score distribution. We demonstrate that these calculated p values are uniform under a null distribution and therefore accurately measure significance. These p values can be used to estimate the false discovery rate, therefore eliminating the need for an extra search against a decoy database. In addition, we show that the p values are better calibrated than their underlying scores; consequently, when ranking top-scoring PSMs from multiple spectra, p values are better at discriminating between correct and incorrect PSMs. The calibration protocol is generally applicable to any PSM score function for which an appopriate parametric family can be identified.

Supplementary figures.

Supplementary data:

Target SEQUEST Decoy SEQUEST Target Crux Xcorr Decoy Crux Xcorr Target Crux Xcorr p-value Decoy Crux Xcorr p-value Target Crux Sp p-value Decoy Crux Sp p-value

60cm.ms2 12.8MB 11.5MB 2.1MB 2.2MB 1.1MB 1.1MB 1.0MB 1.0MB

SEQUEST results were produced by searching these spectra using this parameter file against one of these sequence databases: target, decoy. The ms2 and sqt file formats are described in McDonald et al. (2004).

Home

	Target SEQUEST	Decoy SEQUEST	Target Crux Xcorr	Decoy Crux Xcorr	Target Crux Xcorr p-value	Decoy Crux Xcorr p-value	Target Crux Sp p-value	Decoy Crux Sp p-value
60cm.ms2	12.8MB	11.5MB	2.1MB	2.2MB	1.1MB	1.1MB	1.0MB	1.0MB