Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships.

Li Liao and William Stafford Noble

Journal of Computational Biology. 10(6):857-868, 2003.


One key element in understanding the molecular machinery of the cell is to understand structure and function of each protein encoded in the genome. A very successful means of inferring the structure or function of a previously unannotated protein is via sequence similarity with one or more proteins whose structure or function is already known. Toward this end, we propose a means of representing proteins using pairwise sequence similarity scores. This representation, combined with a discriminative classification algorithm known as the support vector machine, provides a powerful means of detecting subtle structural and evolutionary relationships among proteins. The algorithm, called SVM-pairwise, when tested on its ability to recognize previously unseen families from the SCOP database, yields significantly better performance than SVM-Fisher, profile HMMs and PSI-BLAST.
Supplementary data