svmviaDescription:An optimized C++ program for finding the full regularization path for the support vector machine. This implementation is derived from R code accompanying this article:T. Hastie, S. Rosset, R. Tibshirani and J. Zhu. "The entire regularization path for the support vector machine." Journal of Machine Learning Research. 5:1391-1415, 2004.
svmviaconsists of two programs:
svmvia-train [options] -train <training file> -val <validation file> -model <model file name>Required arguments:
- -train <training file>A file containing paired label values and attribute values. Format is similar to libsvm data format, but is predicated with the number of examples and the number of attributes.
- -val <validation file>A file containing paired label values and attribute values. Format is similar to libsvm data format, but is predicated with the number of examples and the number of attributes.
- -model <model file name>The name of the file where the model with best performance on the validation set will be stored.Options:
- -K poly|rbfSpecifies the type of kernel to be used.
rbfuses the kernel exp(-gamma * ||U-V||^2), and
polyuses the kernel (bias + U.V)^degree.
- -gamma <floating point value>.The value of gamma to be used in the rbf kernel. Default=1.0.
- -bias <floating point value>The value of bias to be used in the poly kernel. Default=1.0.
- -degree <floating point value>The value of degree to be used in the poly kernel. Default=1.0.
- -cache true|falseIndicates whether the entire kernel matrix should be cached (true) or whether only the diagonal should be cached (false). Default=false.
- -verbose true|falseIndicates whether operator entertainment should be displayed during training. When solving the QP problem, cycles are printed followed by the objective function being maximized. When fitting the regularization path, cycle is printed followed by the current value of parameter C and the validation error. Default=false.
- -primal-hyperplaneIf specified on a polynomial kernel of degree 1, the model file will contain the primal form of the optimal hyperplane.
- -errmode misclass|weighted|rocThe errmode alters what measure is used to evaluate model performance on the validation data.
misclasschooses the model that minimizes the number of incorrect classifications;
weightedchooses the model that minimizes the number of incorrect classifications weighted by the number of training examples of each label type, and
rocchooses the model that maximizes the area under the ROC curve. Default=misclass.
- -maxC <floating point value>maxC is the upper bound of the search space for C. The regularization path queries all of the critical points for C in the range (0, maxC]. Default=10000.
svmvia-predict <model file> <source data file> <destination for predictions file>
- <model file>A file created and stored with
- <source data file>A file containing paired label values and attribute values. Format is similar to libsvm data format, but is predicated with the number of examples and the number of attributes. Example data is provided.
- <destination for prediction>File name where the predicted labels for the input file will be stored.
Data file formatThese files are specified using the -train and -val arguments. The first line ofModel file format
svmviadata files has the number of examples in the file and the number of attributes in each example. The remaining lines each describe a particular training example. Each of these lines starts with the class of the example, a value of either 1 or -1. The remainder of the line defines the attribute values for that example. An attribute is specified by its integer index (starting with 0), a colon and then the value of that attribute. A sample line is shown below with class -1, attribute 0 set to .5 and attribute 1 set to 3:Although the class must be the first item on the line, attributes may be set out of order in this manner, as shown by the following equivalent line:
-1 0:.5 1:3Attributes not initialized for an example will be set to 0.
-1 1:3 0:.5A model file is created where specified by the -model argument. This file first contains the kernel type, 0 for radial basis and 1 for polynomial. If the kernel type was radial basis, then the gamma parameter is next. On the other hand, if the file was polynomial, then the bias and degree are next in the file. Next a C style boolean value specifies whether caching was used, followed by the number of training examples and the number of attributes for each example. The next line contains the optimal λ = 1/C parameter chosen, defining the hardness of the margin. Each following line presents the Lagrangian multiplier for that training example, the class of the training example, and the attribute values for the training example. The number of these lines will be equal to the number of examples specified. After the last line, the bias will be stored. If the -primal-hyperplane option is used on a polynomial kernel of degree 1, then the file will have one more line, describing the hyperplane in the primal form. This line will define coefficients for each of the attributes. The bias is the same for the primal form.