Kernel-based data fusion and its application to protein function prediction in yeast

Gert R. G. Lanckriet, Gert R. G., Minghua Deng, Nello Cristianini, Michael I. Jordan and William Stafford Noble

Proceedings of the Pacific Symposium on Biocomputing, January 3-8, 2004. pp. 300-311.


Kernel methods provide a principled framework in which to represent many types of data, including vectors, strings, trees and graphs. As such, these methods are useful for drawing inferences about biological phenomena. We describe a method for combining multiple kernel representations in an optimal fashion, by formulating the problem as a convex optimization problem that can be solved using semidefinite programming techniques. The method is applied to the problem of predicting yeast protein functional classifications using a support vector machine (SVM) trained on five types of data. For this problem, the new method performs better than a previously-described Markov random field method, and better than the SVM trained on any single type of data.

Manuscript (PDF)
Supplementary results (PDF)


From the set of all 6355 yeast genes, we select the following 3588 genes with known function. The following list is an overview of all data used to construct the results in the PSB 2003 paper. All of the files are gzipped, tab-delimited text.

A Matlab implementation of the algorithm proposed by Francis Bach and Gert Lanckriet to solve the multiple kernel learning problem. (See Bach, F.R. & Lanckriet, G.R.G., Jordan, M.I. (2004). Fast Kernel Learning using Sequential Minimal Optimization . Technical Report CSD-04-1307, Division of Computer Science, University of California, Berkeley), courtesy of Guillaume Obozinski.