Revision as of 16:22, 5 February 2023

Programming Project #2: "Featured Artist"

Music and AI (Music356/CS470) | Winter 2023 | by Ge Wang

In this programming project, we will learn to work with audio features for both supervised and unsupervised tasks. These include a real-time genre-classifier and a feature-based audio mosaic tool. Using the latter, create a feature-based musical statement or performance!

Due Dates

Coding tutorial: Thursday evening
Milestone (Phase One complete + Phase Two prototype): webpage due Monday (2/6, 11:59pm) | in-class critique Tuesday (2/7)
Final Deliverable: webpage due Monday (2/13, 11:59pm)
In-class Presentation: Tuesday (2/14)

Discord Is Our Friend

direct any questions, rumination, outputs/interesting mistakes to our class Discord

Things to Think With

read/skim the classic article "Musical Genre Classification of Audio Signals" (Tzanetakis and Cook, 2002)
- don't worry about the details yet; first get a general sense what audio features and how they can be used

Tools to Play With

get the latest bleeding edge secret chuck build (2023.01.23 or later!)
- macOS this will install both command line chuck and the graphical IDE miniAudicle, and replace any previous ChucK installation.
- Windows you will need to download and use the bleeding-edge command line chuck (for now, there is no bleeding-edge miniAudicle for Windows); can either use the default cmd command prompt, or might consider downloading a terminal emulator.
- Linux you will need to build from source, provided in the linux directory
- all platforms for this project, you will be using the command line version of chuck.
NOTE: to return your chuck back to a pre-bleeding-edge state, you can always install the latest official ChucK release
sample code for all phases (including optional video starter code)

GTZAN Dataset

next, you'll need to download the GTZAN dataset
- 1000 30-second music clips, labeled by humans into ten genre categories

Phase One: Extract, Classify, Validate

understanding audio, audio features, FFT, feature extraction
extract different sets of audio features from GTZAN dataset
run real-time classifier using different feature sets
run cross-validation to evaluate the quality of classifier based different features
you can find relevant code here
- start playing with these, and reading through these to get a sense of what the code is doing
- example-centroid.ck -- a basic example of using ChucK's unit analyzer framework (things connected using the upchuck operator =^) to extract an audio feature:
  - generate an input (a 440hz sine wave) -- this can be any audio source, e.g., adc for the microphone
  - take a Fast Fourier Transform (FFT) on a frame of audio (size is determined by the FFT size)
  - using the output of FFT's analysis to compute the Spectral Centroid for that frame of audio
  - note how the ChucK timing is used to precisely control how often to do a frame of analysis
  - the .upchuck() is used to trigger an analysis, automatically cascading up the =^
- example-mfcc.ck -- this is like the previous example, but now we compute a multi-dimensional feature, Mel Frequency Cepstral Coefficients (MFCCs)
- feature-extract.ck -- in a "real-world" scenario, we would extract multiple features. a FeatureCollector is used to aggregate multiple features into a single vector (see comments in the file for more details)
- genre-classify.ck -- using output of feature-extract.ck, do real-time classification by performing the same feature extraction and using k-NN to predict likelihood of each genre category (see comments in the file for more details)
- x-validate.ck -- using output of feature-extract.ck, do cross-validation to get a sense of the classifier quality
experiment by choosing different features and different number of features, extracting them on GTZAN, try the real-time classifier, and perform cross-validation
- available features: Centroid, Flux, RMS, RollOff, ZeroX, MFCCs, Chroma, Kurtosis
- try at least five different feature configurations and evaluate the resulting classifier using cross-validation
  - keep in mind that the baseline score is .1 (a random classifier for 10 genre), and 1 is max
  - how do different--and different numbers of--features affect the classification results?
  - in your experiment, what configuration yielded the highest score in cross-validation?
briefly report on your experiments

Phase Two: Designing an Audio Mosaic Tool

you can find phase 2 sample code here
using what you've learned, build a database mapping sound frames (100::ms to 1::second) <=> feature vectors
- curate your own set of audio files can be mixture of
  - songs or song snippets; we will perform feature extraction on audio windows from beginning to end; in essence each audio window is a short sound fragment with its own feature vector)
  - (optional) short sound effects (~1 second), you may wish to extract a single vector per sound effect
- modify the feature-extract.ck code from Phase One to build your database of sound frames to feature vectors:
  - instead of generating one feature vector for the entire file, output a trajectory of audio windows and associated feature vectors
  - instead of outputting labels (e.g., "blues", "disco", etc.), output information to identify each audio window (e.g., filename and windowStartTime)
  - see reference implementation mosaic-extract.ck
- note this does not require any labels, and like word2vec, we want to situate each sound window in a N-dimension feature space
play with mosaic-similar.ck: a feature-based sound explorer to query your database and perform similarity retrieval (using KNN2)
using your database and retrieval tool and concatenative synthesis and the mosaic-synth-mic.ck and mosaic-synth-doh.ck, design an interactive audio mosaic generator
- feature-based
- real-time
- takes any audio input (mic or any unit generator)
- can be used for expressive audio mosaic creation
there are many functionalities you can choose to incorporate into your mosaic synthesizer
- using a keyboard or mouse control to affect mosaic parameters: synthesis window length, pitch shift (through SndBuf.rate), selecting subsets of sounds to use, etc.
- a key to making this expressive is to try different sound sources; play with them A LOT, gain understanding of the code and experiment!
(optional) do this in the audiovisual domain
- (idea) build a audiovisual mosaic instrument or music creation tool / toy
- (idea) build a GUI for exploring sounds by similarity; will need to reduce dimensions (using PCA or another technique) to 3 or 2 in order to visualize

Phase Three: Make a Musical Mosaic!

- use your prototype from Phase Two to create a feature-based musical mosaic in the form of a musical statement or performance
- (optional) do this in the audiovisual domain

Reflections

write ~300 words of reflection on your project. It can be about your process, or the product. What were the limitations (and how did you try to get around them?)

Deliverables

create a CCRMA webpage for this etude
- the URL should live at https://ccrma.stanford.edu/~YOURUSERID/356/hw2 or https://ccrma.stanford.edu/~YOURUSERID/470/hw2
- alternately, you may use Medium or another publishing platform (but please still link to that page from your CCRMA webpage)
your webpage is to include
- a title and description of your project (free free to link to this wiki page)
- all relevant chuck code from all three phases
  - phase 1: all code used (extraction, classification, validation)
  - phase 2: your mosaic generator, and database query/retrieval tool
  - phase 3: code used for your musical statement
- video recording of your musical statement (please start early!)
- your 300-word reflection
- any acknowledgements (people, code, or other things that helped you through this)
submit to Canvas only your webpage URL

@@ Line 39: / Line 39: @@
 * you can find [https://ccrma.stanford.edu/courses/356/code/featured-artist/ '''relevant code here''']
 ** start playing with these, and reading through these to get a sense of what the code is doing
-** [https://ccrma.stanford.edu/courses/356/code/featured-artist/example-centroid.ck '''example-centroid.ck'''] -- a basic example of using ChucK's unit analyzer framework (things connected using the upchuck operator <code>=^</code>) to extract an audio feature:
+** [https://ccrma.stanford.edu/courses/356/code/featured-artist/phase-1-classify/example-centroid.ck '''example-centroid.ck'''] -- a basic example of using ChucK's unit analyzer framework (things connected using the upchuck operator <code>=^</code>) to extract an audio feature:
 *** generate an input (a 440hz sine wave) -- this can be any audio source, e.g., adc for the microphone
 *** take a Fast Fourier Transform (FFT) on a frame of audio (size is determined by the FFT size)
@@ Line 45: / Line 45: @@
 *** note how the ChucK timing is used to precisely control how often to do a frame of analysis
 *** the <code>.upchuck()</code> is used to trigger an analysis, automatically cascading up the <code>=^</code>
-** [https://ccrma.stanford.edu/courses/356/code/featured-artist/example-mfcc.ck '''example-mfcc.ck'''] -- this is like the previous example, but now we compute a multi-dimensional feature, Mel Frequency Cepstral Coefficients (MFCCs)
+** [https://ccrma.stanford.edu/courses/356/code/featured-artist/phase-1-classify/example-mfcc.ck '''example-mfcc.ck'''] -- this is like the previous example, but now we compute a multi-dimensional feature, Mel Frequency Cepstral Coefficients (MFCCs)
-** [https://ccrma.stanford.edu/courses/356/code/featured-artist/feature-extract.ck '''feature-extract.ck'''] -- in a "real-world" scenario, we would extract multiple features. a <code>FeatureCollector</code> is used to aggregate multiple features into a single vector (see comments in the file for more details)
+** [https://ccrma.stanford.edu/courses/356/code/featured-artist/phase-1-classify/feature-extract.ck '''feature-extract.ck'''] -- in a "real-world" scenario, we would extract multiple features. a <code>FeatureCollector</code> is used to aggregate multiple features into a single vector (see comments in the file for more details)
-** [https://ccrma.stanford.edu/courses/356/code/featured-artist/genre-classify.ck '''genre-classify.ck'''] -- using output of <code>feature-extract.ck</code>, do real-time classification by performing the same feature extraction and using k-NN to predict likelihood of each genre category (see comments in the file for more details)
+** [https://ccrma.stanford.edu/courses/356/code/featured-artist/phase-1-classify/genre-classify.ck '''genre-classify.ck'''] -- using output of <code>feature-extract.ck</code>, do real-time classification by performing the same feature extraction and using k-NN to predict likelihood of each genre category (see comments in the file for more details)
-** [https://ccrma.stanford.edu/courses/356/code/featured-artist/x-validate.ck '''x-validate.ck'''] -- using output of <code>feature-extract.ck</code>, do cross-validation to get a sense of the classifier quality
+** [https://ccrma.stanford.edu/courses/356/code/featured-artist/phase-1-classify/x-validate.ck '''x-validate.ck'''] -- using output of <code>feature-extract.ck</code>, do cross-validation to get a sense of the classifier quality
 * experiment by choosing different features and different number of features, extracting them on GTZAN, try the real-time classifier, and perform cross-validation
 ** available features: Centroid, Flux, RMS, RollOff, ZeroX, MFCCs, Chroma, Kurtosis

Difference between revisions of "356-winter-2023/hw2"

Revision as of 16:22, 5 February 2023

Contents

Programming Project #2: "Featured Artist"

Due Dates

Discord Is Our Friend

Things to Think With

Tools to Play With

GTZAN Dataset

Phase One: Extract, Classify, Validate

Phase Two: Designing an Audio Mosaic Tool

Phase Three: Make a Musical Mosaic!

Reflections

Deliverables

Navigation menu

Views

Personal tools

Navigation

Search

Tools