Semi-supervised learning for peptide identification from shotgun proteomics datasets

Lukas Kall, Jesse D. Canterbury, Jason Weston, William Stafford Noble, Michael J. MacCoss

79(16):6111-6118, 2007.


Shotgun proteomics uses liquid chromatography-tandem mass spectrometry to identify proteins in complex biological samples. We describe an algorithm, called Percolator, for improving the rate of confident peptide identifications from a collection of tandem mass spectra. Percolator uses semi-supervised machine learning to discriminate between correct and decoy spectrum identifications, correctly assigning peptides to 17% more spectra from a tryptic Saccharomyces cerevisiae dataset, and up to 77% more spectra from non-tryptic digests, relative to a fully supervised approach.

Nature Methods
Supplementary data