Difference between revisions of "356-winter-2023/hw2"
From CCRMA Wiki
(→Things to Think With) |
|||
Line 5: | Line 5: | ||
In this programming project, we will learn to work with '''audio features''' for both supervised and unsupervised tasks. These include a '''real-time genre-classifier''' and a '''feature-based audio mosaic tool''' using similarity retrieval. Create a feature-driven '''musical statement or performance'''! | In this programming project, we will learn to work with '''audio features''' for both supervised and unsupervised tasks. These include a '''real-time genre-classifier''' and a '''feature-based audio mosaic tool''' using similarity retrieval. Create a feature-driven '''musical statement or performance'''! | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
=== Due Dates === | === Due Dates === | ||
Line 76: | Line 55: | ||
*** usage: <code>chuck poem-ungenerate.ck</code> | *** usage: <code>chuck poem-ungenerate.ck</code> | ||
− | === | + | === Phase One: Feature Extract, Classify, Validate === |
− | * | + | * understanding audio, FFT, feature extraction |
− | + | * extract different sets of audio features from GTZAN dataset | |
− | * | + | * run real-time classifier using different feature sets |
− | * | + | * run cross-validation to evaluate the quality of classifier based different features |
− | + | ||
− | + | ||
− | + | ||
− | * | + | |
− | + | ||
− | === | + | === Phase Two: Curate Feature Database, Design Audio Mosaic Tool === |
− | * a | + | * build a database mapping sound frames (100::ms to 1::second) <=> feature vectors |
− | + | ** curate your own set of audio files can be mixture of | |
− | * | + | *** short sound effects (~1 second) |
− | * | + | *** music (we will perform feature extraction on each short-time window) |
− | * | + | * prototype a feature-based sound explorer to query your database and perform similarity retrieval |
− | * | + | * using your database and retrieval tool, design an interactive audio mosaic generator |
− | * | + | ** feature-based |
− | * | + | ** real-time |
+ | ** takes any audio input (mic or any unit generator) | ||
+ | ** can be used for performance | ||
+ | * (optional) do this in the audiovisual domain | ||
+ | |||
+ | === Phase Three === | ||
+ | ** use your prototype from Phase Two to create a musical statement | ||
+ | ** (optional) do this in the audiovisual domain | ||
=== Reflections === | === Reflections === | ||
− | * write ~ | + | * write ~300 words of reflection on your project. It can be about your process, or the product. What were the limitations (and how did you try to get around them?) |
=== Deliverables === | === Deliverables === | ||
* create a CCRMA webpage for this etude | * create a CCRMA webpage for this etude | ||
− | ** the URL should live at https://ccrma.stanford.edu/~YOURUSERID/356/ | + | ** the URL should live at https://ccrma.stanford.edu/~YOURUSERID/356/hw2 or https://ccrma.stanford.edu/~YOURUSERID/470/hw2 |
** alternately, you may use Medium or another publishing platform (but please still link to that page from your CCRMA webpage) | ** alternately, you may use Medium or another publishing platform (but please still link to that page from your CCRMA webpage) | ||
− | * your | + | * your webpage is to include |
− | ** a title and | + | ** a title and description of your project (free free to link to this wiki page) |
− | ** all relevant | + | ** all relevant chuck code from all three phases |
− | ** your | + | *** phase 1: all code used (extraction, classification, validation) |
− | ** your | + | *** phase 2: your mosaic generator, and database query/retrieval tool |
+ | *** phase 3: code used for your musical statement | ||
+ | ** video recording of your musical statement (please start early!) | ||
+ | ** your 300-word reflection | ||
** any acknowledgements (people, code, or other things that helped you through this) | ** any acknowledgements (people, code, or other things that helped you through this) | ||
* submit to Canvas '''only your webpage URL''' | * submit to Canvas '''only your webpage URL''' |
Revision as of 00:54, 24 January 2023
Programming Project #2: "Featured Artist"
Music and AI (Music356/CS470) | Winter 2023 | by Ge Wang
In this programming project, we will learn to work with audio features for both supervised and unsupervised tasks. These include a real-time genre-classifier and a feature-based audio mosaic tool using similarity retrieval. Create a feature-driven musical statement or performance!
Due Dates
- Milestone 1: webpage due Monday (1/30, 11:59pm) | in-class check-in Tuesday (1/31)
- Final Deliverable: webpage due Wednesday (2/8, 11:59pm)
- In-class Presentation: Thursday (2/9)
Discord Is Our Friend
- direct any questions, rumination, outputs/interesting mistakes to our class Discord
Things to Think With
- read/skim the classic article "Musical Genre Classification of Audio Signals" (Tzanetakis and Cook, 2002)
- don't worry about the details yet; first get a general sense what audio features and how they can be used
Tools to Play With
- get the latest bleeding edge secret
chuck
build (2023.01.23 or later!)- macOS this will install both command line
chuck
and the graphical IDE miniAudicle, and replace any previous ChucK installation. - Windows you will need to download and use the bleeding-edge command line
chuck
(for now, there is no bleeding-edge miniAudicle for Windows); can either use the defaultcmd
command prompt, or might consider downloading a terminal emulator. - Linux you will need to build from source, provided in the
linux
directory - all platforms for this project, you will be using the command line version of chuck.
- macOS this will install both command line
- NOTE: to return your chuck back to a pre-bleeding-edge state, you can always install the latest official ChucK release
GTZAN Dataset
- next, you'll need to download the GTZAN dataset
- 1000 30-second music clips, labeled by humans into ten genre categories
HW2 Sample Code
- you can find sample code here
- start playing with these, and reading through these to get a sense of what the code is doing
- word2vec-basic.ck -- basic example that...
- loads a word vector
- prints # of words and # of dimensions in the model
- shows how to get a vector associated with a word (using
getVector()
) - shows how to retrieve K most similar words using a word (using
getSimilar()
) - shows how to retrieve K most similar words using a vector (using
getSimilar()
) - uses the W2V helper class to evaluate an expression like "puppy - dog + cat" (using
W2V.eval()
) - uses the W2V helper class to evaluate a logical analog like dog:puppy::cat:?? (using
W2V.analog()
)
- word2vec-prompt.ck -- interactive prompt word2vec explorer
- this keeps a model loaded while allowing you to play with it
- type
help
to get started
- starter-prompt.ck -- minimal starter code for those wishing to include an interactive prompt in chuck, with sound
- word2vec-basic.ck -- basic example that...
- example of poems
- "i feel" -- a stream of unconsciousness poem (dependency: glove-wiki-gigaword-50-tsne-2 or any other model)
- usage:
chuck poem-i-feel.ck
- usage:
- "Random Walk" -- another stream of unconsciousness poem
- usage:
chuck poem-randomwalk.ck
orchuck poem-randomwalk.START_WORD
(to provide the starting word)
- usage:
- "Spew" -- yet another stream of unconsciousness poem
- usage:
chuck poem-spew.ck
orchuck poem-spew.ck:START_WORD
(to provide the starting word)
- usage:
- "Degenerate" -- a prompt-based example (run in command line chuck)
- usage:
chuck poem-ungenerate.ck
- usage:
- "i feel" -- a stream of unconsciousness poem (dependency: glove-wiki-gigaword-50-tsne-2 or any other model)
Phase One: Feature Extract, Classify, Validate
- understanding audio, FFT, feature extraction
- extract different sets of audio features from GTZAN dataset
- run real-time classifier using different feature sets
- run cross-validation to evaluate the quality of classifier based different features
Phase Two: Curate Feature Database, Design Audio Mosaic Tool
- build a database mapping sound frames (100::ms to 1::second) <=> feature vectors
- curate your own set of audio files can be mixture of
- short sound effects (~1 second)
- music (we will perform feature extraction on each short-time window)
- curate your own set of audio files can be mixture of
- prototype a feature-based sound explorer to query your database and perform similarity retrieval
- using your database and retrieval tool, design an interactive audio mosaic generator
- feature-based
- real-time
- takes any audio input (mic or any unit generator)
- can be used for performance
- (optional) do this in the audiovisual domain
Phase Three
- use your prototype from Phase Two to create a musical statement
- (optional) do this in the audiovisual domain
Reflections
- write ~300 words of reflection on your project. It can be about your process, or the product. What were the limitations (and how did you try to get around them?)
Deliverables
- create a CCRMA webpage for this etude
- the URL should live at https://ccrma.stanford.edu/~YOURUSERID/356/hw2 or https://ccrma.stanford.edu/~YOURUSERID/470/hw2
- alternately, you may use Medium or another publishing platform (but please still link to that page from your CCRMA webpage)
- your webpage is to include
- a title and description of your project (free free to link to this wiki page)
- all relevant chuck code from all three phases
- phase 1: all code used (extraction, classification, validation)
- phase 2: your mosaic generator, and database query/retrieval tool
- phase 3: code used for your musical statement
- video recording of your musical statement (please start early!)
- your 300-word reflection
- any acknowledgements (people, code, or other things that helped you through this)
- submit to Canvas only your webpage URL