Learning kernels from biological networks by maximizing entropy

Learning kernels from biological networks by maximizing entropy
Koji Tsuda and William Stafford Noble
Bioinformatics (Proceedings of the ISMB/ECCB), 2004. To appear.
Abstract
The diffusion kernel is a general method for computing pairwise distances among all nodes in a graph, based upon the sum of weighted paths between each pair of nodes. This technique has been used successfully, in conjunction with kernel-based learning methods, to draw inferences from several types of biological networks. We show that computing the diffusion kernel is equivalent to maximizing the von Neumann entropy, subject to a global constraint on the sum of the Euclidean distances between nodes. This global constraint allows for high variance in the pairwise distances. Accordingly, we propose an alternative, locally constrained diffusion kernel, and we demonstrate that the resulting kernel allows for more accurate support vector machine prediction of protein functional classifications from metabolic and protein-protein interaction networks. Supplementary results are available at noble.gs.washington.edu/proj/maxent.
Supplementary results

Data

Following are the data sets used in the paper. Each matrix is in tab-delimited text format, with gene names in the first row and column. The random train/test splits are represented by a separate label matrix, in which each element of matrix is one of the following:
  1: training data (positive)
  2: training data (negative)
  3: test data (positive)
  4: test data (negative)
  0: not used in classification (no information in CYGD)
Ligand data
Ligand labels
protein-protein interaction data
protein-protein interaction labels

Software is available by request from Koji Tsuda (koji.tsuda@tuebingen.mpg.de).