Kernel-based data fusion and its application to protein function prediction in yeast

Gert R. G. Lanckriet, Gert R. G., Minghua Deng, Nello Cristianini, Michael I. Jordan and William Stafford Noble

Proceedings of the Pacific Symposium on Biocomputing, January 3-8, 2004. pp. 300-311.


Kernel methods provide a principled framework in which to represent many types of data, including vectors, strings, trees and graphs. As such, these methods are useful for drawing inferences about biological phenomena. We describe a method for combining multiple kernel representations in an optimal fashion, by formulating the problem as a convex optimization problem that can be solved using semidefinite programming techniques. The method is applied to the problem of predicting yeast protein functional classifications using a support vector machine (SVM) trained on five types of data. For this problem, the new method performs better than a previously-described Markov random field method, and better than the SVM trained on any single type of data.

Supplementary data