Nonstationary kernel combination

Darrin P. Lewis, Tony Jebara, William Stafford Noble

Proceedings of the International Conference on Machine Learning, June 25-29, 2006, Pittsburgh, PA.


The power and popularity of kernel methods stem in part from their ability to handle diverse forms of structured inputs, including vectors, graphs and strings. Recently, several methods have been proposed for combining kernels from heterogeneous data sources. However, all of these methods produce stationary combinations; i.e., the relative weights of the various kernels do not vary among input examples. This article proposes a method for combining multiple kernels in a nonstationary fashion. The approach uses a large-margin latentvariable generative model within the maximum entropy discrimination (MED) framework. Latent parameter estimation is rendered tractable by variational bounds and an iterative optimization procedure. The classi er we use is a log-ratio of Gaussian mixtures, in which each component is implicitly mapped via a Mercer kernel function. We show that the support vector machine is a special case of this model. In this approach, discriminative parameter estimation is feasible via a fast sequential minimal optimization algorithm. Empirical results are presented on synthetic data, several benchmarks, and on a protein function annotation task.