Difference between revisions of "356-winter-2023/hw2"

From CCRMA Wiki
Jump to: navigation, search
(Phase One: Extract, Classify, Validate)
(Phase Two: Design Audio Mosaic Tool)
Line 56: Line 56:
 
* briefly report on your experiments
 
* briefly report on your experiments
  
=== Phase Two: Design Audio Mosaic Tool ===
+
=== Phase Two: Designing Audio Mosaic Tool ===
 +
* here is the [https://ccrma.stanford.edu/courses/356/code/featured-artist/ sample code]
 
* using what you've learned, build a database mapping sound frames (100::ms to 1::second) <=> feature vectors
 
* using what you've learned, build a database mapping sound frames (100::ms to 1::second) <=> feature vectors
 
** curate your own set of audio files can be mixture of
 
** curate your own set of audio files can be mixture of
Line 63: Line 64:
 
** modify the <code>feature-extract.ck</code> code from Phase One to build your database of sound frames to feature vectors
 
** modify the <code>feature-extract.ck</code> code from Phase One to build your database of sound frames to feature vectors
 
** note this does not require any labels, and like word2vec, we want to situate each sound window in a N-dimension feature space
 
** note this does not require any labels, and like word2vec, we want to situate each sound window in a N-dimension feature space
* prototype a feature-based sound explorer to query your database and perform similarity retrieval (using KNN2)
+
* play with <code>mosaic-similar.ck</code>: a feature-based sound explorer to query your database and perform similarity retrieval (using KNN2)
* using your database and retrieval tool and [https://en.wikipedia.org/wiki/Concatenative_synthesis concatenative synthesis] (you will be provided starter code), design an interactive audio mosaic generator
+
* using your database and retrieval tool and [https://en.wikipedia.org/wiki/Concatenative_synthesis concatenative synthesis] and the <code>mosaic-synth-mic.ck</code> and <code>mosaic-synth-doh.ck</a>, design an interactive audio mosaic generator
 
** feature-based
 
** feature-based
 
** real-time
 
** real-time
 
** takes any audio input (mic or any unit generator)
 
** takes any audio input (mic or any unit generator)
** can be used for performance
+
** can be used for expressive audio mosaic creation
 
* (optional) do this in the audiovisual domain; build a GUI for exploring sounds by similarity; will need to reduce dimensions (using PCA or another technique) to 3 or 2 in order to visualize
 
* (optional) do this in the audiovisual domain; build a GUI for exploring sounds by similarity; will need to reduce dimensions (using PCA or another technique) to 3 or 2 in order to visualize
  

Revision as of 02:45, 31 January 2023

Programming Project #2: "Featured Artist"

Music and AI (Music356/CS470) | Winter 2023 | by Ge Wang

Mosaiconastick.jpg

In this programming project, we will learn to work with audio features for both supervised and unsupervised tasks. These include a real-time genre-classifier and a feature-based audio mosaic tool. Using the latter, create a feature-based musical statement or performance!

Due Dates

  • Coding tutorial: Thursday evening (time TBD) | attendance is high recommended (will save you time later)
  • Milestone (Phase One complete + Phase Two prototype): webpage due Wednesday (2/1, 11:59pm) | in-class critique Thursday (2/2)
  • Final Deliverable: webpage due Wednesday (2/8, 11:59pm)
  • In-class Presentation: Thursday (2/9)

Discord Is Our Friend

  • direct any questions, rumination, outputs/interesting mistakes to our class Discord

Things to Think With

Tools to Play With

  • get the latest bleeding edge secret chuck build (2023.01.23 or later!)
    • macOS this will install both command line chuck and the graphical IDE miniAudicle, and replace any previous ChucK installation.
    • Windows you will need to download and use the bleeding-edge command line chuck (for now, there is no bleeding-edge miniAudicle for Windows); can either use the default cmd command prompt, or might consider downloading a terminal emulator.
    • Linux you will need to build from source, provided in the linux directory
    • all platforms for this project, you will be using the command line version of chuck.
  • NOTE: to return your chuck back to a pre-bleeding-edge state, you can always install the latest official ChucK release

GTZAN Dataset

  • next, you'll need to download the GTZAN dataset
    • 1000 30-second music clips, labeled by humans into ten genre categories

Phase One: Extract, Classify, Validate

  • understanding audio, audio features, FFT, feature extraction
  • extract different sets of audio features from GTZAN dataset
  • run real-time classifier using different feature sets
  • run cross-validation to evaluate the quality of classifier based different features
  • you can find relevant code here
    • start playing with these, and reading through these to get a sense of what the code is doing
    • example-centroid.ck -- a basic example of using ChucK's unit analyzer framework (things connected using the upchuck operator =^) to extract an audio feature:
      • generate an input (a 440hz sine wave) -- this can be any audio source, e.g., adc for the microphone
      • take a Fast Fourier Transform (FFT) on a frame of audio (size is determined by the FFT size)
      • using the output of FFT's analysis to compute the Spectral Centroid for that frame of audio
      • note how the ChucK timing is used to precisely control how often to do a frame of analysis
      • the .upchuck() is used to trigger an analysis, automatically cascading up the =^
    • example-mfcc.ck -- this is like the previous example, but now we compute a multi-dimensional feature, Mel Frequency Cepstral Coefficients (MFCCs)
    • feature-extract.ck -- in a "real-world" scenario, we would extract multiple features. a FeatureCollector is used to aggregate multiple features into a single vector (see comments in the file for more details)
    • genre-classify.ck -- using output of feature-extract.ck, do real-time classification by performing the same feature extraction and using k-NN to predict likelihood of each genre category (see comments in the file for more details)
    • x-validate.ck -- using output of feature-extract.ck, do cross-validation to get a sense of the classifier quality
  • experiment by choosing different features and different number of features, extracting them on GTZAN, try the real-time classifier, and perform cross-validation
    • available features: Centroid, Flux, RMS, RollOff, ZeroX, MFCCs, Chroma, Kurtosis
    • try at least five different feature configurations and evaluate the resulting classifier using cross-validation
      • keep in mind that the baseline score is .1 (a random classifier for 10 genre), and 1 is max
      • how do different--and different numbers of--features affect the classification results?
      • in your experiment, what configuration yielded the highest score in cross-validation?
  • briefly report on your experiments

Phase Two: Designing Audio Mosaic Tool

  • here is the sample code
  • using what you've learned, build a database mapping sound frames (100::ms to 1::second) <=> feature vectors
    • curate your own set of audio files can be mixture of
      • short sound effects (~1 second)
      • music (we will perform feature extraction on all short-time windows from beginning to end; in essence each short-time window becomes a short sound fragment with its own feature vector)
    • modify the feature-extract.ck code from Phase One to build your database of sound frames to feature vectors
    • note this does not require any labels, and like word2vec, we want to situate each sound window in a N-dimension feature space
  • play with mosaic-similar.ck: a feature-based sound explorer to query your database and perform similarity retrieval (using KNN2)
  • using your database and retrieval tool and concatenative synthesis and the mosaic-synth-mic.ck and mosaic-synth-doh.ck</a>, design an interactive audio mosaic generator
    • feature-based
    • real-time
    • takes any audio input (mic or any unit generator)
    • can be used for expressive audio mosaic creation
  • (optional) do this in the audiovisual domain; build a GUI for exploring sounds by similarity; will need to reduce dimensions (using PCA or another technique) to 3 or 2 in order to visualize

Phase Three: Make a Musical Mosaic!

    • use your prototype from Phase Two to create a feature-based musical mosaic in the form of a musical statement or performance
    • (optional) do this in the audiovisual domain

Reflections

  • write ~300 words of reflection on your project. It can be about your process, or the product. What were the limitations (and how did you try to get around them?)

Deliverables

  • create a CCRMA webpage for this etude
  • your webpage is to include
    • a title and description of your project (free free to link to this wiki page)
    • all relevant chuck code from all three phases
      • phase 1: all code used (extraction, classification, validation)
      • phase 2: your mosaic generator, and database query/retrieval tool
      • phase 3: code used for your musical statement
    • video recording of your musical statement (please start early!)
    • your 300-word reflection
    • any acknowledgements (people, code, or other things that helped you through this)
  • submit to Canvas only your webpage URL