Dealing with large diagonals in kernel matrices

Jason Weston, Bernhard Schoelkopf, Eleazar Eskin, Christina Leslie and William Stafford Noble

Principles of Data Mining and Knowledge Discovery. Spring Lecture Notes in Computer Science 243.


In kernel methods, all the information about the training data is contained in the Gram matrix. If this matrix has large diagonal values, which arises for many types of kernels, then kernel methods do not perform well. We propose and test several methods for dealing with this problem by reducing the dynamic range of the matrix while preserving the positive definiteness of the Hessian of the quadratic programming problem that one has to solve when training a support vector machine, which is a common kernel approach for pattern recognition.