Difference between revisions of "356-winter-2023/hw2"

From CCRMA Wiki
Jump to: navigation, search
(Phase Two: Designing an Audio Mosaic Tool)
(Phase One: Extract, Classify, Validate)
Line 39: Line 39:
 
* you can find [https://ccrma.stanford.edu/courses/356/code/featured-artist/ '''relevant code here''']
 
* you can find [https://ccrma.stanford.edu/courses/356/code/featured-artist/ '''relevant code here''']
 
** start playing with these, and reading through these to get a sense of what the code is doing
 
** start playing with these, and reading through these to get a sense of what the code is doing
** [https://ccrma.stanford.edu/courses/356/code/featured-artist/example-centroid.ck '''example-centroid.ck'''] -- a basic example of using ChucK's unit analyzer framework (things connected using the upchuck operator <code>=^</code>) to extract an audio feature:
+
** [https://ccrma.stanford.edu/courses/356/code/featured-artist/phase-1-classify/example-centroid.ck '''example-centroid.ck'''] -- a basic example of using ChucK's unit analyzer framework (things connected using the upchuck operator <code>=^</code>) to extract an audio feature:
 
*** generate an input (a 440hz sine wave) -- this can be any audio source, e.g., adc for the microphone
 
*** generate an input (a 440hz sine wave) -- this can be any audio source, e.g., adc for the microphone
 
*** take a Fast Fourier Transform (FFT) on a frame of audio (size is determined by the FFT size)
 
*** take a Fast Fourier Transform (FFT) on a frame of audio (size is determined by the FFT size)
Line 45: Line 45:
 
*** note how the ChucK timing is used to precisely control how often to do a frame of analysis
 
*** note how the ChucK timing is used to precisely control how often to do a frame of analysis
 
*** the <code>.upchuck()</code> is used to trigger an analysis, automatically cascading up the <code>=^</code>
 
*** the <code>.upchuck()</code> is used to trigger an analysis, automatically cascading up the <code>=^</code>
** [https://ccrma.stanford.edu/courses/356/code/featured-artist/example-mfcc.ck '''example-mfcc.ck'''] -- this is like the previous example, but now we compute a multi-dimensional feature, Mel Frequency Cepstral Coefficients (MFCCs)
+
** [https://ccrma.stanford.edu/courses/356/code/featured-artist/phase-1-classify/example-mfcc.ck '''example-mfcc.ck'''] -- this is like the previous example, but now we compute a multi-dimensional feature, Mel Frequency Cepstral Coefficients (MFCCs)
** [https://ccrma.stanford.edu/courses/356/code/featured-artist/feature-extract.ck '''feature-extract.ck'''] -- in a "real-world" scenario, we would extract multiple features. a <code>FeatureCollector</code> is used to aggregate multiple features into a single vector (see comments in the file for more details)
+
** [https://ccrma.stanford.edu/courses/356/code/featured-artist/phase-1-classify/feature-extract.ck '''feature-extract.ck'''] -- in a "real-world" scenario, we would extract multiple features. a <code>FeatureCollector</code> is used to aggregate multiple features into a single vector (see comments in the file for more details)
** [https://ccrma.stanford.edu/courses/356/code/featured-artist/genre-classify.ck '''genre-classify.ck'''] -- using output of <code>feature-extract.ck</code>, do real-time classification by performing the same feature extraction and using k-NN to predict likelihood of each genre category (see comments in the file for more details)
+
** [https://ccrma.stanford.edu/courses/356/code/featured-artist/phase-1-classify/genre-classify.ck '''genre-classify.ck'''] -- using output of <code>feature-extract.ck</code>, do real-time classification by performing the same feature extraction and using k-NN to predict likelihood of each genre category (see comments in the file for more details)
** [https://ccrma.stanford.edu/courses/356/code/featured-artist/x-validate.ck '''x-validate.ck'''] -- using output of <code>feature-extract.ck</code>, do cross-validation to get a sense of the classifier quality
+
** [https://ccrma.stanford.edu/courses/356/code/featured-artist/phase-1-classify/x-validate.ck '''x-validate.ck'''] -- using output of <code>feature-extract.ck</code>, do cross-validation to get a sense of the classifier quality
 
* experiment by choosing different features and different number of features, extracting them on GTZAN, try the real-time classifier, and perform cross-validation
 
* experiment by choosing different features and different number of features, extracting them on GTZAN, try the real-time classifier, and perform cross-validation
 
** available features: Centroid, Flux, RMS, RollOff, ZeroX, MFCCs, Chroma, Kurtosis
 
** available features: Centroid, Flux, RMS, RollOff, ZeroX, MFCCs, Chroma, Kurtosis

Revision as of 16:22, 5 February 2023

Programming Project #2: "Featured Artist"

Music and AI (Music356/CS470) | Winter 2023 | by Ge Wang

Mosaiconastick.jpg

In this programming project, we will learn to work with audio features for both supervised and unsupervised tasks. These include a real-time genre-classifier and a feature-based audio mosaic tool. Using the latter, create a feature-based musical statement or performance!

Due Dates

  • Coding tutorial: Thursday evening
  • Milestone (Phase One complete + Phase Two prototype): webpage due Monday (2/6, 11:59pm) | in-class critique Tuesday (2/7)
  • Final Deliverable: webpage due Monday (2/13, 11:59pm)
  • In-class Presentation: Tuesday (2/14)

Discord Is Our Friend

  • direct any questions, rumination, outputs/interesting mistakes to our class Discord

Things to Think With

Tools to Play With

  • get the latest bleeding edge secret chuck build (2023.01.23 or later!)
    • macOS this will install both command line chuck and the graphical IDE miniAudicle, and replace any previous ChucK installation.
    • Windows you will need to download and use the bleeding-edge command line chuck (for now, there is no bleeding-edge miniAudicle for Windows); can either use the default cmd command prompt, or might consider downloading a terminal emulator.
    • Linux you will need to build from source, provided in the linux directory
    • all platforms for this project, you will be using the command line version of chuck.
  • NOTE: to return your chuck back to a pre-bleeding-edge state, you can always install the latest official ChucK release
  • sample code for all phases (including optional video starter code)

GTZAN Dataset

  • next, you'll need to download the GTZAN dataset
    • 1000 30-second music clips, labeled by humans into ten genre categories

Phase One: Extract, Classify, Validate

  • understanding audio, audio features, FFT, feature extraction
  • extract different sets of audio features from GTZAN dataset
  • run real-time classifier using different feature sets
  • run cross-validation to evaluate the quality of classifier based different features
  • you can find relevant code here
    • start playing with these, and reading through these to get a sense of what the code is doing
    • example-centroid.ck -- a basic example of using ChucK's unit analyzer framework (things connected using the upchuck operator =^) to extract an audio feature:
      • generate an input (a 440hz sine wave) -- this can be any audio source, e.g., adc for the microphone
      • take a Fast Fourier Transform (FFT) on a frame of audio (size is determined by the FFT size)
      • using the output of FFT's analysis to compute the Spectral Centroid for that frame of audio
      • note how the ChucK timing is used to precisely control how often to do a frame of analysis
      • the .upchuck() is used to trigger an analysis, automatically cascading up the =^
    • example-mfcc.ck -- this is like the previous example, but now we compute a multi-dimensional feature, Mel Frequency Cepstral Coefficients (MFCCs)
    • feature-extract.ck -- in a "real-world" scenario, we would extract multiple features. a FeatureCollector is used to aggregate multiple features into a single vector (see comments in the file for more details)
    • genre-classify.ck -- using output of feature-extract.ck, do real-time classification by performing the same feature extraction and using k-NN to predict likelihood of each genre category (see comments in the file for more details)
    • x-validate.ck -- using output of feature-extract.ck, do cross-validation to get a sense of the classifier quality
  • experiment by choosing different features and different number of features, extracting them on GTZAN, try the real-time classifier, and perform cross-validation
    • available features: Centroid, Flux, RMS, RollOff, ZeroX, MFCCs, Chroma, Kurtosis
    • try at least five different feature configurations and evaluate the resulting classifier using cross-validation
      • keep in mind that the baseline score is .1 (a random classifier for 10 genre), and 1 is max
      • how do different--and different numbers of--features affect the classification results?
      • in your experiment, what configuration yielded the highest score in cross-validation?
  • briefly report on your experiments

Phase Two: Designing an Audio Mosaic Tool

  • you can find phase 2 sample code here
  • using what you've learned, build a database mapping sound frames (100::ms to 1::second) <=> feature vectors
    • curate your own set of audio files can be mixture of
      • songs or song snippets; we will perform feature extraction on audio windows from beginning to end; in essence each audio window is a short sound fragment with its own feature vector)
      • (optional) short sound effects (~1 second), you may wish to extract a single vector per sound effect
    • modify the feature-extract.ck code from Phase One to build your database of sound frames to feature vectors:
      • instead of generating one feature vector for the entire file, output a trajectory of audio windows and associated feature vectors
      • instead of outputting labels (e.g., "blues", "disco", etc.), output information to identify each audio window (e.g., filename and windowStartTime)
      • see reference implementation mosaic-extract.ck
    • note this does not require any labels, and like word2vec, we want to situate each sound window in a N-dimension feature space
  • play with mosaic-similar.ck: a feature-based sound explorer to query your database and perform similarity retrieval (using KNN2)
  • using your database and retrieval tool and concatenative synthesis and the mosaic-synth-mic.ck and mosaic-synth-doh.ck, design an interactive audio mosaic generator
    • feature-based
    • real-time
    • takes any audio input (mic or any unit generator)
    • can be used for expressive audio mosaic creation
  • there are many functionalities you can choose to incorporate into your mosaic synthesizer
    • using a keyboard or mouse control to affect mosaic parameters: synthesis window length, pitch shift (through SndBuf.rate), selecting subsets of sounds to use, etc.
    • a key to making this expressive is to try different sound sources; play with them A LOT, gain understanding of the code and experiment!
  • (optional) do this in the audiovisual domain
    • (idea) build a audiovisual mosaic instrument or music creation tool / toy
    • (idea) build a GUI for exploring sounds by similarity; will need to reduce dimensions (using PCA or another technique) to 3 or 2 in order to visualize

Phase Three: Make a Musical Mosaic!

    • use your prototype from Phase Two to create a feature-based musical mosaic in the form of a musical statement or performance
    • (optional) do this in the audiovisual domain

Reflections

  • write ~300 words of reflection on your project. It can be about your process, or the product. What were the limitations (and how did you try to get around them?)

Deliverables

  • create a CCRMA webpage for this etude
  • your webpage is to include
    • a title and description of your project (free free to link to this wiki page)
    • all relevant chuck code from all three phases
      • phase 1: all code used (extraction, classification, validation)
      • phase 2: your mosaic generator, and database query/retrieval tool
      • phase 3: code used for your musical statement
    • video recording of your musical statement (please start early!)
    • your 300-word reflection
    • any acknowledgements (people, code, or other things that helped you through this)
  • submit to Canvas only your webpage URL