Kernel methods for predicting protein-protein interactions

Asa Ben-Hur and William Stafford Noble

Bioinformatics (Proceedings of the Intelligent Systems for Molecular Biology Conference). 21(Suppl 1):i38-i46, 2005.


Despite advances in high throughput methods for discovering protein-protein interactions, the interaction networks of even well-studied model organisms are sketchy at best, highlighting the continued need for computational methods to help direct experimentalists in the search for novel interactions. We present a kernel method for predicting protein-protein interactions using a combination of data sources, including protein sequences, Gene Ontology annotations, local properties of the network, and homologous interactions in other species. Whereas protein kernels proposed in the literature provide a similarity between single proteins, prediction of interactions requires a kernel between pairs of proteins. We propose a pairwise kernel that converts a kernel between single proteins into a kernel between pairs of proteins, and we illustrate the kernel's effectiveness in conjunction with a support vector machine classifier. Furthermore, we obtain improved performance by combining several sequence-based kernels based on k-mer frequency, motif and domain content and by further augmenting the pairwise sequence kernel with features that are based on other sources of data.

We apply our method to predict physical interactions in yeast using data from the BIND database. At a false positive rate of 1% the classifier retrieves close to 80% of a set of trusted interactions. We thus demonstrate the ability of our method to make accurate predictions despite the sizeable fraction of false positives that are known to exist in interaction databases.

Supplementary data