356-winter-2023/hw2
From CCRMA Wiki
Programming Project #2: "Featured Artist"
Music and AI (Music356/CS470) | Winter 2023 | by Ge Wang
In this programming project, we will learn to work with audio features for both supervised and unsupervised tasks. These include a real-time genre-classifier and a feature-based audio mosaic tool. Using the latter, create a feature-based musical statement or performance!
Due Dates
- Coding tutorial: Thursday evening (time TBD) | attendance is high recommended (will save you a lot of time later)
- Milestone (Phase One complete + Phase Two prototype): webpage due Wednesday (2/1, 11:59pm) | in-class critique Thursday (2/2)
- Final Deliverable: webpage due Wednesday (2/8, 11:59pm)
- In-class Presentation: Thursday (2/9)
Discord Is Our Friend
- direct any questions, rumination, outputs/interesting mistakes to our class Discord
Things to Think With
- read/skim the classic article "Musical Genre Classification of Audio Signals" (Tzanetakis and Cook, 2002)
- don't worry about the details yet; first get a general sense what audio features and how they can be used
Tools to Play With
- get the latest bleeding edge secret
chuck
build (2023.01.23 or later!)- macOS this will install both command line
chuck
and the graphical IDE miniAudicle, and replace any previous ChucK installation. - Windows you will need to download and use the bleeding-edge command line
chuck
(for now, there is no bleeding-edge miniAudicle for Windows); can either use the defaultcmd
command prompt, or might consider downloading a terminal emulator. - Linux you will need to build from source, provided in the
linux
directory - all platforms for this project, you will be using the command line version of chuck.
- macOS this will install both command line
- NOTE: to return your chuck back to a pre-bleeding-edge state, you can always install the latest official ChucK release
GTZAN Dataset
- next, you'll need to download the GTZAN dataset
- 1000 30-second music clips, labeled by humans into ten genre categories
Phase One: Extract, Classify, Validate
- understanding audio, audio features, FFT, feature extraction
- extract different sets of audio features from GTZAN dataset
- run real-time classifier using different feature sets
- run cross-validation to evaluate the quality of classifier based different features
- you can find relevant code here
- start playing with these, and reading through these to get a sense of what the code is doing
- example-centroid.ck -- a basic example of using ChucK's unit analyzer framework (things connected using the upchuck operator
=^
) to extract an audio feature:- generate an input (a 440hz sine wave) -- this can be any audio source, e.g., adc for the microphone
- take a Fast Fourier Transform (FFT) on a frame of audio (size is determined by the FFT size)
- using the output of FFT's analysis to compute the Spectral Centroid for that frame of audio
- note how the ChucK timing is used to precisely control how often to do a frame of analysis
- the
.upchuck()
is used to trigger an analysis, automatically cascading up the=^
- example-mfcc.ck -- this is like the previous example, but now we compute a multi-dimensional feature, Mel Frequency Cepstral Coefficients (MFCCs)
- feature-extract.ck -- in a "real-world" scenario, we would extract multiple features. a
FeatureCollector
is used to aggregate multiple features into a single vector (see comments in the file for more details) - genre-classify.ck -- using output of
feature-extract.ck
, do real-time classification by performing the same feature extraction and using k-NN to predict likelihood of each genre category (see comments in the file for more details) - x-validate.ck -- using output of
feature-extract.ck
, do cross-validation to get a sense of the classifier quality
- experiment by choosing different features and different number of features, extracting them on GTZAN, try the real-time classifier, and perform cross-validation
- available features: Centroid, Flux, RMS, RollOff, ZeroX, MFCC (the # of coefficients can be set), etc.
- try at least five different feature configurations and evaluate the resulting classifier
- briefly report on your experiments
Phase Two: Design Audio Mosaic Tool
- using what you've learned, build a database mapping sound frames (100::ms to 1::second) <=> feature vectors
- curate your own set of audio files can be mixture of
- short sound effects (~1 second)
- music (we will perform feature extraction on each short-time window)
- modify the
feature-extract.ck
code from Phase One to build your database of sound frames to feature vectors - note this does not require any labels, and like word2vec, we want to situation each sound window in a N-dimension feature space
- curate your own set of audio files can be mixture of
- prototype a feature-based sound explorer to query your database and perform similarity retrieval (using KNN2)
- using your database and retrieval tool, design an interactive audio mosaic generator
- feature-based
- real-time
- takes any audio input (mic or any unit generator)
- can be used for performance
- (optional) do this in the audiovisual domain; build a GUI for exploring sounds by similarity; will need to reduce dimensions (using PCA or another technique) to 3 or 2 in order to visualize
Phase Three: Make a Musical Mosaic!
- use your prototype from Phase Two to create a feature-based musical mosaic in the form of a musical statement or performance
- (optional) do this in the audiovisual domain
Reflections
- write ~300 words of reflection on your project. It can be about your process, or the product. What were the limitations (and how did you try to get around them?)
Deliverables
- create a CCRMA webpage for this etude
- the URL should live at https://ccrma.stanford.edu/~YOURUSERID/356/hw2 or https://ccrma.stanford.edu/~YOURUSERID/470/hw2
- alternately, you may use Medium or another publishing platform (but please still link to that page from your CCRMA webpage)
- your webpage is to include
- a title and description of your project (free free to link to this wiki page)
- all relevant chuck code from all three phases
- phase 1: all code used (extraction, classification, validation)
- phase 2: your mosaic generator, and database query/retrieval tool
- phase 3: code used for your musical statement
- video recording of your musical statement (please start early!)
- your 300-word reflection
- any acknowledgements (people, code, or other things that helped you through this)
- submit to Canvas only your webpage URL