Lab 1 - Basic feature extraction and classification (2013)

Sunday, June 23, 2013

11:37 PM



 This lab will introduce you to the practice of analyzing, segmenting, feature extracting, and applying basic classifications to audio files.  Our future labs will build upon this essential work - but will use more sophisticated training sets, features, and classifiers. 


We'll first need to setup some additional Matlab folders, toolboxes, and scripts that we'll use later. 



Course related code, toolboxes, and audio are stored at: /usr/ccrma/courses/mir2013



1.       Launch Matlab

2.       Configure your Path:  Add the folder /usr/ccrma/courses/mir2013/Toolboxes to your local Matlab path (including all subfolders).

3.       Set the "Java Heap Memory" to 900 MB via : File>Preferences>General> Java Heap Memory

This allows us to load large audio files and feature vectors into memory. 

Click on "OK"

Click Apply.

4.       Restart Matlab.


Why are the Paste / Save keys different?   Why does Paste default to Control-Y?  

On Linux, Matlab defaults to using Emacs key bindings.  If you want Mac or Windows bindings, go:

File menu > Preferences > Keyboard

Switch the Editor/Debugger key bindings to "Windows".


You can easily comment and uncomment code by hitting Cntr-R,  Cntrl-T. 



To read MP3 files into Matlab, we have a function called mp3read.  It is used just like wavread. 


A large collection of audio files for your experimentation are located at  /usr/ccrma/courses/mir2013/audio



Purpose: We'll experiment with the different features for known frames and see if we can build a basic understanding of what they are doing.  


1.       Make sure to save all of your development code in an .m file.  You can build upon and reuse much of this code over the workshop.  


To create a new .m file, chose:

File> New > Script...

Save the file as Lab1.m

You can execute the code in Lab1.m  via any of the below options:

·  Type Lab1.m in the command window


press F5 in the Editor to execute the current selected script.


·  You can execute 1 or more commands selected in the Editor window at a time.  Select the code and press F9.  Note the Command Window will update. 


2.       Tab Completion. 

Tab Completion works in Command Window and the Editor.   After you type a few letters, hit the Tab key and a popup will appear and show you all of the possible completions, including variable names and functions.    This prevents you to mistyping the names of variables - a big time (and aggravation) saver! 


For example, in the Command Line or Editor , try typing wavr   and then hitting Tab!    ("wavread" should appear)


3.       Load the audio file simpleLoop.wav into Matlab, storing it in the variable x and sampling rate in fs. 



4.       In this course, we will convert all stereo files to mono.  Include this code after your read in WAV files to automatically detect if a file is stereo and convert it to mono.



% If your audio files (x) are stereo, here's how to make them mono:

if size(x,2) == 2

x= (x(:,1)+x(:,2) ) ./ max(abs(x(:,1)+x(:,2))) ;

disp('Making your file mono…');



5.       You can play the audio file by typing using typing



To stop listening to a long audio file, press Control-C.    Audio snippets less than ~8000 samples will often not play out Matlab.  (known bug on Linux machines)


6.       Run an onset detector to determine the approximate onsets in the audio file. 

[onsets] = onset_times(x,fs);  % leighs onset detector with signal and  sample_rate as input

onsets=round(fs*onsets);    % convert onset times in seconds to to samples - round to nearest integer sample

numonsets = length(onsets);


For debugging, we have function which generates mixes of the original audio file and onset times.

This is demonstrated with test_onsets.m


One of Matlab's greatest features is its rich and easy visualization functions.  Visualizing your data at every possible step in the algorithm development process not only builds a practical understanding of the variables, parameters and results, but it greatly aids debugging. 


8.       Plot the audio file in a figure window. 


9.       Now, add a marker showing the position of each onset on top of the waveforms. 

plot(x); hold on; plot(onsets,0.2,'rx')


10.   Adding text markers to your plots can further aid in debugging or visualizing problems.  Label each onset with it's respective onset number with the following simple loop:

for i=1:numonsets

    text(onsets(i),0.2,num2str(i));  % num2st converts an number to a string for display purposes



Labeling the data is crucial.   Add a title and axis to the figures.  (ylabel, xlabel, title.)



title('my onset plot')


11.   Now that we can view the various onsets, try out the onset detector and visualization on a variety of other audio examples located in /usr/ccrma/courses/mir2013/audio.   Continue to load the various audio files and run the onset detector - does it seem like it works well?    If not, yell at Leigh.


Segmenting audio in Frames

As we learned in lecture, it's common to chop up the audio into fixed-frames.  These frames are then further analyzed, processed, or feature extracted.  We're going to analyze the audio in 100 ms frames starting at each onset. 


12.   Create a loop which carves up the audio in fixed-size frames (100ms), starting at the onsets.

13.   Inside of your loop, plot each frame, and play the audio for each frame. 


% Loop to carve up audio into onset-based frames

frameSize = 0.100 *fs;        % sec

for i=1:numonsets

    frames{i}= x(onsets(i):onsets(i)+frameSize);


    plot(frames{i}); title(['frame ' num2str(i)]);   

    sound(frames{i}  ,fs);




Feature extract your frames

14.   Create a loop which extracts the Zero Crossing Rate for each frame, and stores it in an array.   

Your loop will select 100ms (in samples, this value is =  fs * 0.1) , starting at the onsets, and obtain the number of zero crossings in that frame. 


The command   [z] = zcr(x)   returns the number of zero crossings for a vector x.

Don't forget to store the value of z in a feature array for each frame.


clear features

% Extract Zero Crossing Rate from all frames and store it in "features(i,1)"

for i=1:numonsets

    features(i,1) = zcr(frames{i})



For simpleLoop.wav, you should now have a feature array of 5 x 1 - which is the 5 frames (one at each detected onset) and 1 feature (zcr) for each frame. 


Sort the audio file by its feature array. 

Let's test out how well our features characterize the underlying audio signal. 

To build intuition, we're going to sort the feature vector by it's zero crossing rate, from low value to highest value. 


15.   If we sort and re-play the audio that corresponds with these sorted frames, what do you think it will sound like?  (e.g., same order as the loop, reverse order of the loop, snares followed by kicks, quiet notes followed by loud notes, or ??? )   Pause and think about this. 


16.   Now, we're going to play these sorted audio frames, from lowest to highest.  (The pause command will be quite useful here, too.)  How does it sound?  Does it sort them how you expect them to be sorted? 


 [y,index] = sort(features);


for i=1:numonsets


    figure(1); plot(frames{index(i)});title(i);




You'll notice how trivial this drum loop is - always use familiar and predictable audio files when you're developing your algorithms. 


17.   Now that you have this file loading, playing , and sorting working, try this with out files, such as:

/usr/ccrma/courses/mir2013/audio/CongaGroove-mono.wav, and /usr/ccrma/courses/mir2013/audio/ 125BOUNC-mono.WAV.



SECTION 2 - Spectral Features & k-NN



My first audio classifier: introducing K-NN!  We can now appreciate why we need additional intelligence in our systems - heuristics can't very far in the world of complex audio signals.  We'll be using Netlab's implementation of the k-NN for our work here.  It proves be a straight-forward and easy to use implementation.  The steps and skills of working with one classifier will scale nicely to working with other, more complex classifiers. 


We're also going to be using the new features in our arsenal: cherishing those "spectral moments" (centroid, bandwidth, skewness, kurtosis) and also examining other spectral statistics. 



First off, we want to analyze and feature extract a small collection of audio samples - storing their feature data as our "training data".  The below commands read all of the .wav files in a directory into a structure, snareFileList.  


1.       Use these commands to read in a list of filenames (samples) in a directory, replacing the path with the actual directory that the audio \ drum samples are stored in.


snareDirectory = ['/usr/ccrma/courses/mir2013/audio/drum samples/snares/'];

snareFileList = getFileNames(snareDirectory ,'wav')


kickDirectory = ['/usr/ccrma/courses/mir2013/audio/drum samples/kicks/'];

kickFileList = getFileNames(kickDirectory ,'wav')


2.       To access the filenames contained in the cell array, use the brackets { }  to get to the element that you want to access. 


For example, to access the text file name of the 1st file in the list, you would type  snareFileList{1}

Try it out:



When we feature extract a sample collection, we need to sequentially access audio files, segment them (or not), and feature extract them.  Loading a lot of audio files into memory is not always a feasible or desirable operation, so you will create a loop which loads an audio file, feature extracts it, and closes  the audio file.  Note that the only information that we retain in memory are the features that are extracted.


3.       Create a loop which reads in an audio file, extracts the zero crossing rate, and some spectral statistics.  The feature information for each audio file (the "feature vector") should be stored as a feature array, with columns being the features and rows for each file. 


Or in Matlab, for example:

featuresSnare =


  1.0e+003 *


    0.5730    1.9183    2.9713    0.0004 0.0002

    0.4750    1.4834    2.4463    0.0004  0.0012

    0.5900    2.2857    3.1788    0.0003  0.0041

    0.5090    1.6622    2.6369    0.0004  0.0051

    0.4860    1.4758    2.2085    0.0004  0.0021

    0.6060    2.2119    3.2798    0.0004  0.0651

    0.4990    2.0607    2.7654    0.0004  0.0721

    0.6360    2.3153    3.0256    0.0003  0.0221

    0.5490    2.0137    3.0342    0.0004  0.0016

    0.5900    2.2857    3.1788    0.0003  0.0012


In your loop, here's how to read in your wav files, using a structure of file names:

    [x,fs]=wavread([snareDirectory snareFileList{i}]);     %note the use of brackets for snareFileList


Here's an example of how to feature extract for the current audio file..

frameSize = 0.100 * fs;   % 100ms

currentFrame = x(1:frameSize)

                                 featuresSnare(i,1)   = zcr(currentFrame);

[centroid, bandwidth, skew, kurtosis]=spectralMoments(currentFrame,fs,8192)

           featuresSnare(i,2:5) = [centroid, bandwidth, skew, kurtosis];


4.       First, extract all of the feature data for the kick drums and store it in a feature array.  (For my example, above, I'd put it in "featuresKick")

5.       Next, extract all of the feature data for the snares, storing them in a different array. 

Again, the kick and snare features should be separated in two different arrays!


OK, no more help.  The rest is up to you! 



Building Models

1.       Examine the feature array for the various snare samples.  What do you notice? 


2.       Since the features are different scales, we will want to normalize each feature vector to a common range - storing the scaling coefficients for later use.  Many techniques exist for scaling your features.  We'll use linear scaling, which forces the features into the range -1 to 1.


For this, we'll use a custom-created function called scale.  Scale returns an array of scaled values, as well as the multiplication and subtraction values which were used to conform each column into -1 to 1.  Use this function in your code. 


    [trainingFeatures,mf,sf]=scale([featuresSnare; featuresKick]);


Building a k-NN

3.       Build a k-NN model for the snare drums in Netlab, using the function knn. 


We'll the implementation of from the Matlab toolbox "netlab":

>help knn

NET = KNN(NIN, NOUT, K, TR_IN, TR_TARGETS) creates a KNN model NET

         with input dimension NIN, output dimension NOUT and K neighbours.

         The training data is also stored in the data structure and the

         targets are assumed to be using a 1-of-N coding.


         The fields in NET are

           type = 'knn'

           nin = number of inputs

           nout = number of outputs

           tr_in = training input data

           tr_targets = training target data


Here's an example...


labels=[[ones(10,1) zeros(10,1)]; [zeros(10,1) ones(10,1) ]];


Which is an array of ones and zeros to correspond to the 10 snares and 10 kicks in our training sample set:


     1     0

     1     0

     1     0

     1     0

     1     0

     1     0

     1     0

     1     0

     1     0

     1     0

     0     1

     0     1

     0     1

     0     1

     0     1

     0     1

     0     1

     0     1

     0     1

     0     1


[trainingFeatures,mf,sf]=scale([featuresSnare; featuresKick]);


model_snare = knn(5,2,1,trainingFeatures,labels);        


This k-NN model uses 5 features,  2 classes for output (the label), uses k-NN = 1, and takes in the feature data via a feature array called trainingFeatures.


These labels indicate which sample in our feature data is a snare, vs. a non-snare.  The k-NN model uses this information to build a means of comparison and classification.  It is really important that you get these labels correct - because they are the crux of all future classifications that are made later on.  (Trust me, I've made many mistakes in this area - training models with incorrect label data.)



4.       Create a script which extracts features for a single file, re-scales its feature values, and evaluates them with your kNN classifier. 

Evaluating samples with your k-NN

Now that the hard part is done, it's time to throw some feature data through the trained k-NN and see what it outputs. 



In evaluating a new audio file, we need to extract it's features, re-scale them to the same range as the trained feature values, and then send them through the knn.


Some helpful commands:

featuresScaled = rescale(features,mf,sf) ;   % This uses the previous calculated linear scaling parameters to adjust the incoming features to the same range.  



[voting,model_output]=knnfwd(model_snare , featuresScaled )


The  output voting gives you a breakdown of how many nearest neighbors were closest to the test feature vector.  


The model_output provides a list of whether output is Class 1 or Class 2.


output = zeros(size(model_output),2)




Now you can visually compare the output to trainlabels


Once you have completed function, first, test it with your training examples.  Since a k-NN model has exact representations of the training data, it will have 100% training accuracy - meaning that every training example should be predicted correctly, when fed back into the trained model. 


Now, test out with the examples in the folder "test kicks" and "test snares", located in the drum samples folder.  These are real-world testing samples…


If the output labels "1" or "0" aren't insightful for you, you can add an if statement to display them as strings "snare" and "kick".




Tricks of the trade

Select code in Matlab editor and then press F9.  This will execute the currently selected code.

To run a Matlab "cell" (multiline block of code),  press Control-Enter with the text cursor in the current cell.


The clear command re-initializes a variable.  To avoid confusion, you mind find it helpful to clear arrays and structures at the beginning of your scripts.


Common Errors

>??? Index exceeds matrix dimensions.

Are you trying to access, display, plot, or play past the end of the file / frame? 

For example, if an audio file is 10,000 samples long, make sure that the index is not greater than this maximum value.   If the value is > than the length of your file, use an if statement to catch the problem.



Copyright 2013 Jay LeBoeuf

Portions can be re-used by for educational purposes with consent of copyright owner.