Semi-supervised learning for peptide identification from shotgun proteomics datasets

Lukas Käll, Jesse D. Canterbury, Jason Weston, William Stafford Noble and Michael J. MacCoss

Nature Methods 4:923 - 925, November 2007


Shotgun proteomics uses liquid chromatography-tandem mass spectrometry to identify proteins in complex biological samples. We describe an algorithm, called Percolator, for improving the rate of peptide identifications from a collection of tandem mass spectra. Percolator uses semi-supervised machine learning to discriminate between correct and decoy spectrum identifications, correctly assigning peptides to 17% more spectra from a tryptic dataset and up to 77% more spectra from non-tryptic digests, relative to a fully supervised approach.

The ms2 and sqt file formats are described in McDonald et al.(2004).