Lab 5 - SVMs
Thursday, July 15, 2010
12:46 PM
PURPOSE
Goal: By the end of this lab, you will
understand the how to build and test with a SVM model.
SETUP
Note
these instructions are for Linux boxes only.
This has not been tested or supported on non-Linux Laptop configs.
1. Verify that SVMlab folder is installed in your local working
directory. (If not, please download it
from /usr/ccrma/courses/mir2010/Toolboxes
/ Utilities / SVMlab folder.)
2. Add the entire SVMlab folder (with subfolders) to your
Matlab path.
3. Now, we need to reset the permissions of the files in this
folder:
Open
a Terminal window and cd to the directory where you placed your SVMlab
folder.
cd
to the folder /Libsvm-2.86/tools/
In
the Terminal, type:
chmod
755 grid.py
chmod
755
svmtrain
chmod
755
svmpredict
chmod
755
svmscale
To
verify that everything worked, type in grid.py. You should see something that says
"Usage: grid.py [-log2c begin, eng] etc."
If
you see "permission denied", then something went wrong - repeat the
above process.
SECTION 1:
BUILDING AN SVM
FEATURE EXTRACT
Since
we need data that we can assign LABELS to, feature extract a collection of
instrument samples (as many features as you want).
Labels
in SVM format take the format of {-1 or
1} for negative and positive examples respectively.
For
variety, choose instruments based on samples in:
You
can choose to classify based on artists or instrument examples for files posted
in:
/audio/Miscellaneous
Loops Samples and SFX/Instrument Samples
Don't
forget to scale the feature data, save the scaling coefficients so we can scale
the test data to be in the same range.
Create a label vector
using class labels {1,-1}
A
useful command for creating label vectors with "-1" is the command:
repmat (-1, size(features.frames,1) , 1 )
% This command generates a vector repeating the number -1 many,
many, many times...
SECTION 2:
USING AN SVM
Find
the best parameters (C and gamma)
EXPORT
DATA
We
are using an external program (a python script) to run a grid search and look
for the best values of C and gamma. To
export your data so that libSVM can read it, I've created a helper function.
mat2libSVMFormat.m
mat2libSVMFormat(data,label,filename)
For
example:
mat2libSVMFormat(features ,
labels,'~/Matlab/libsvm-2.86/tools/myData.txt')
This
file needs to be saved in the save directory as the libSVM tools (grid.py etc)
are located.
Open
up the file to see the particular format that libSVM prefers - and also to
verify that the data was written out correctly.
GRID
SEARCH
Now
that the data has been exported from Matlab, open a Terminal window.
In
the Terminal, type:
>
grid.py myData.txt
You'll
see the following sample text output to the screen...
[...]
[local]
13 -13 61.9048 (best c=512.0,
g=0.001953125, rate=66.6667)
[local]
13 1 52.381 (best c=512.0,
g=0.001953125, rate=66.6667)
[local]
13 -11 61.9048 (best c=512.0,
g=0.001953125, rate=66.6667)
[local]
13 -5 57.1429 (best c=512.0, g=0.001953125,
rate=66.6667)
[local]
13 -15 57.1429 (best c=512.0,
g=0.001953125, rate=66.6667)
[local]
13 3 57.1429 (best c=512.0,
g=0.001953125, rate=66.6667)
[local]
13 -9 42.8571 (best c=512.0,
g=0.001953125, rate=66.6667)
[local]
13 -3 61.9048 (best c=512.0,
g=0.001953125, rate=66.6667)
512.0
0.001953125 66.6667
These
numbers show the parameters as they were chosen during a grid
search,
with the corresponding cross-validation error rate for that
particular
model. So, in effect, the grid search has
built dozens of
SVM
models with various parameter settings, and chosen the parameters
that
it believe have the best chance of success given your current
training
data.
The
numbers at the bottom are what we are most interested in.
The
first number (512) is C, and the second number (0.001953125) is gamma.
Note
that the current cross-validation accuracy is 66.6667% (Since I
had
a small number of samples in this colleciton, there are many
possible
best choices which also have 66.7% accuracy)
See
the README in libSVM\libsvm-2.85\libsvm-2.85\tools for additional information
on the easy.py and grid.py scripts, if you are the curious sort and find
yourself interested.
Write
down these values of C and gamma --- we'll use these in Matlab to build out SVM
model.
To
build an SVM:
Type
svmtrain in Matlab to review all of the myriad of options for it. (If you cannot find svmtrain, then make sure
to add the folder libsvm-mat-2.86 to your Matlab path)
An
example of how to train it on your feature data using the parameters returned
by the grid search:
model = svmtrain(labels,features,'-t 2 -g 0.001953125 -c 512')
The
"-t 2" specifies RBF kernel.
"-g" species the value of gamma and "-c"
specifies C.
To
test with your SVM:
Feature
extract some examples, and don't forget to rescale the data to the same
mf and sf (scale factors) as before.
Now,
to evaluate, all you do is:
svmpredict(testlabels,
features, model)
"Test
Labels?", you ask.
Yes,
if you know the labels for your testing data, insert them into this
vector. So, for example, you can insert
the training labels and training feature data into this function, and
svmpredict will automatically calculate the accuracy for you.
If
you do not know the labels for your test data (likely the case), then insert a
vector of zeros equal to the number of test samples that you have.
To test with your SVM:
Feature
extract some examples, and don't forget to rescale the data to the same
mf and sf (scale factors) as before.
Now,
to evaluate, all you do is:
[
predict_label , accuracy ] = svmpredict(labels,
features, model)
RE:
input labels
"Labels?",
you ask. Yes, if you know the labels for
your testing data, insert them into this vector. So, for example, you can insert the training
labels and training feature data into this function, and svmpredict will
automatically calculate the accuracy for you.
If
you do not know the labels for your test data (likely the case), then
insert a vector of zeros equal to the number of test samples that you
have.
SECTION 3:
HAVING FUN
Try
redoing some of the previous labs' instrument classifiers or artist/genre
classifiers using an SVM.
Recommended reading
A Practical Guide to Support Vector
Classification
http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf
It provides an introduction to the libsvm
tools, and motivations for why they were
developed. It also highlights common
mistakes.
Additional Resources
SVM Practical (How to get good results
without cheating)
http://www.kyb.tuebingen.mpg.de/bs/people/weston/svmpractical/
Libsvm and Libsvm Tools
http://www.csie.ntu.edu.tw/~cjlin/libsvm/
http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/
The interactive Matlab SVM Demo that I
demonstrated on Day 5:
http://homepages.cae.wisc.edu/~ece539/matlab/
http://homepages.cae.wisc.edu/~ece539/matlab/svmdemo.m
HELP!
Troubleshooting
If you are experiencing really bizarre
results, it's sometimes worthwhile to double-check that the labels are set
correctly. ( "1" for positive example
and "0" for negative example.)
Not indicating the correct label will gravely affect the model.
Also, try deleting the temporary output
feature file. Sometimes, the file isn't
updated by Matlab -- but it's a silent error...
OPTIONAL : How to Build libSVM from source
code.
Hopefully, you do NOT need to do this step
- I've done the work for you. But for
the curious...
The following steps will build the libsvm
executables from their source - this is necessary to run them on our Linux
machines.
1. Download the folder libsvm to your local Matlab folder.
2. Within the libsvm folder, open the file Makefile with a text
editor.
3. On the 2nd line, change /usr/local/matlab to
/opt/matlabR2006b (Or whatever
your version of Matlab is)
4. Save the file.
5. Open a Terminal window and cd to the folder containing the
Makefile
6. Type make
Copyright 2010 Jay LeBoeuf
Portions can be re-used by for educational
purposes with consent of copyright owner.