This page demonstrates a real-time music tagging visualizer named SAFVY! (Sparse Audio Feature Visualizer, Yeah!) which was implemented as part of my thesis and also collaboration work with my CCRMAmate, Jorge Herrera.
SAFVY! displays various audio representations including waveforms (green), log-frequency spectrogram (brown) and a sparse feature representation by an unsupervised learning algorithm (purple). We regard the learned feature representation as ''musical neuron activation''.
It also displays different categories of semantic tags (genre, emotion, instrument, voice quality, song and usage) predicted from the feature representation by supervised leanring. The tag size changes depending on the confidence level (each tag has a corresponding classifier and it returns the confidence level, i.e., distance from the decision boundary ).
While the classifiers were trained with song-level features summarized from an audio track, the real-time visualizer summarizes local features only up to current time and makes predictions from it. Thus, the tagging can make no sense initially but will progressively get more accurate as the song is played more.