*SVM ^{multiclass}* uses the multi-class formulation described in [1], but optimizes it with an algorithm that is very fast in the linear case. For a training set (x

_{1},y

_{1}) ... (x

_{n},y

_{n}) with labels y

_{i}in [1..k], it finds the solution of the following optimization problem during training.

min 1/2 Σ_{i=1..k} w_{i}*w_{i} + C/nΣ_{i = 1..n} ξ_{i}

s.t. for all y in [1..k]: [ x_{1}• w_{yi} ] >= [ x_{1}• w_{y} ] + 100*Δ(y_{1},y) - ξ_{1 ... }for all y in [1..k]: [ x_{n}• w_{yn} ] >= [ x_{n}• w_{y} ] + 100*Δ(y_{n},y) - ξ_{n}

C is the usual regularization parameter that trades off margin size and training error. Δ(y_{n},y) is the loss function that returns 0 if y_{n} equals y, and 1 otherwise.

To solve this optimization problem, *SVM ^{multiclass}* uses an algorithm that is different from the one in [1]. The algorithm is based on Structural SVMs [2] and it is an instance of SVMstruct. For linear kernels,

*SVM*V2.20 is very fast and runtime scales linearly with the number of training examples. Non-linear kernels are not (really) supported. It also serves as a easy tutorial example of how to use the

^{multiclass}*SVM*programming interface. More information on

^{struct}*SVM*is available here.

^{struct}Please send me email and let me know that you got it. The archive contains the source code of the most recent version of *SVM ^{multiclass}*, which includes the source code of SVMstruct and the SVMlight quadratic optimizer. Unpack the archive using the shell command:

gunzip –c svm_multiclass.tar.gz | tar xvf –

## How to Use

SVM* ^{multiclass}* consists of a learning module (

`svm_multiclass_learn`) and a classification module (

`svm_multiclass_classify`). The classification module can be used to apply the learned model to new examples. See also the examples below for how to use

`svm_multiclass_learn`and

`svm_multiclass_classify`.

*SVM*. You call it like

^{light}svm_multiclass_learn -c 1.0 example_file model_filewhich trains an SVM on the training set

`example_file`and outputs the learned rule to

`model_file`using the regularization parameter

`C`set to

`1.0`. Note that the

`C`parameters is scaled differently from SVM

*. Other options are:*

^{light}General Options: -? -> this help -v [0..3] -> verbosity level (default 1) -y [0..3] -> verbosity level for svm_light (default 0)Learning Options: -c float -> C: trade-off between training error and margin (default 0.01) -p [1,2] -> L-norm to use for slack variables. Use 1 for L1-norm, use 2 for squared slacks. (default 1) -o [1,2] -> Rescaling method to use for loss. 1: slack rescaling 2: margin rescaling (default 2) -l [0..] -> Loss function to use. 0: zero/one loss ?: see below in application specific options (default 0)Optimization Options (see [2][4]): -w [0,..,9] -> choice of structural learning algorithm (default 4): 0: n-slack algorithm described in [2] 1: n-slack algorithm with shrinking heuristic 2: 1-slack algorithm (primal) described in [4] 3: 1-slack algorithm (dual) described in [4] 4: 1-slack algorithm (dual) with constraint cache [4] 9: custom algorithm in svm_struct_learn_custom.c -e float -> epsilon: allow that tolerance for termination criterion (default 0.100000) -k [1..] -> number of new constraints to accumulate before recomputing the QP solution (default 100) (-w 0 and 1 only) -f [5..] -> number of constraints to cache for each example (default 5) (used with -w 4) -b [1..100] -> percentage of training set for which to refresh cache when no epsilon violated constraint can be constructed from current cache (default 100%) (used with -w 4) -n [2..q] -> number of new variables entering the working set in each svm-light iteration (default n = q). Set n < q to prevent zig-zagging. -m [5..] -> size of svm-light cache for kernel evaluations in MB (default 40) (used only for -w 1 with kernels) -h [5..] -> number of svm-light iterations a variable needs to be optimal before considered for shrinking (default 100) -# int -> terminate svm-light QP subproblem optimization, if no progress after this number of iterations. (default 100000)Kernel Options: -t int -> type of kernel function: 0: linear (default) 1: polynomial (s a*b+c)^d 2: radial basis function exp(-gamma ||a-b||^2) 3: sigmoid tanh(s a*b + c) 4: user defined kernel from kernel.h -d int -> parameter d in polynomial kernel -g float -> parameter gamma in rbf kernel -s float -> parameter s in sigmoid/poly kernel -r float -> parameter c in sigmoid/poly kernel -u string -> parameter of user defined kernelOutput Options: -a string -> write all alphas to this file after learning (in the same order as in the training set)Application-Specific Options: noneFor more details on the meaning of these options consult the description of SVMstruct, the description of SVMlight, and references [2][4].

The input file `example_file` contains the training examples. The file format is the same as for *SVM ^{light}*, just that the target value is now a positive integer that indicates the class. The first lines may contain comments and are ignored if they start with #. Each of the following lines represents one training example and is of the following format:

`<line> .=. <target> <feature>:<value> <feature>:<value> ... <feature>:<value> # <info>`

`<target> .=. <integer>`

`<feature> .=. <integer>`

`<value> .=. <float>`

`<info> .=. <string>`

The target value and each of the feature/value pairs are separated by a space character. Feature/value pairs MUST be ordered by increasing feature number. Features with value zero can be skipped. The target value denotes the class of the example via a positive (non-zero) integer. So, for example, the line

3 1:0.43 3:0.12 9284:0.2 # abcdef

specifies an example of class 3 for which feature number 1 has the value 0.43, feature number 3 has the value 0.12, feature number 9284 has the value 0.2, and all the other features have value 0. In addition, the string `abcdef` is stored with the vector, which can serve as a way of providing additional information when adding user defined kernels.

The result of `svm_multiclass_learn` is the model which is learned from the training data in `example_file`. The model is written to `model_file`. To make predictions on test examples, `svm_multiclass_classify` reads this file. `svm_multiclass_classify` is called as follows:

svm_multiclass_classify [options] test_example_file model_file output_file

For all test examples in `test_example_file` the predicted classes (and the values of x• w_{i} for each class) are written to `output_file`. There is one line per test example in `output_file` in the same order as in `test_example_file`.The first value in each line is the predicted class, and each of the following numbers are the discriminant values for each of the k classes.

## Getting started: an Example Problem

You will find an example problem with 7 classes at

https://download.joachims.org/svm_multiclass/examples/example4.tar.gz

Download this file into your svm_multiclass directory and unpack it with

gunzip -c example4.tar.gz | tar xvf -

This will create a subdirectory `example4`. There are 300 examples in the file `train.dat` and 2000 in the file `test.dat`. To run the example, execute the commands:

svm_multiclass_learn -c 5000 example4/train.dat example4/model `svm_multiclass_classify example4/test.dat example4/model example4/predictions`

The accuracy on the test set is printed to stdout.

## Extensions and Additions

- Ruby Interface: interface to call SVM
^{multiclass}from Ruby, written by Vicente Bosch.

## Disclaimer

This software is free only for non-commercial use. It must not be distributed without prior permission of the author. The author is not responsible for implications from the use of this software.

## Known Problems

- Training with kernels is not really supported, since it is very very slow in the current version.

## History

#### V2.12 - V2.20

- Includes version V3.10 of
*SVMstruct*, which makes it substantially faster. - Now writes linear model files in a more compact format (i.e. stores only the weight vectors, not the support vectors).
- Kernels now actually work correctly, but using a non-linear kernel is still very slow.
- Fixed bug in RBF Kernel.
- Fixed precision issues on 64-bit AMD and Intel machines.
- Source code for SVMmulticlass V2.12.

#### V1.01 - V2.12

- New training algorithm based on equivalent 1-slack reformulation of the training problem. This makes training on the linear kernel several orders of magnitude faster than in V1.01. See changes introduced bySVMstruct V3.00 for a general properties of the new structural SVM training algorithm.
- New IO routines that are faster for reading large data and model files than in
- Source code for SVMmulticlass V1.01.

#### V1.00 - V1.01

## References

**[1]** K. Crammer and Y. Singer. On the Algorithmic Implementation of Multi-class SVMs, JMLR, 2001.

**[2]** I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Support Vector Learning for Interdependent and Structured Output Spaces, ICML, 2004. [Postscript]] [PDF]

**[3]** T. Joachims, Making Large-Scale SVM Learning Practical. Advances in Kernel Methods - Support Vector Learning, B. Schölkopf and C. Burges and A. Smola (ed.), MIT Press, 1999. [Postscript (gz)] [PDF]

**[4] **T. Joachims, T. Finley, Chun-Nam Yu, Cutting-Plane Training of Structural SVMs, Machine Learning Journal, to appear. [Draft PDF]