We call it SAFVY! (Sparse Audio Feature Visualizer, Yeah!). It displays the learned feature representation as ''musical neuron activation''. Not only that but it also shows semantic tags predicted from the played song.
The visualizer shows waveform (green), a log-frequency spectrogram (brown) and the feature representation (purple) along the data processing pipeline. In addition, it displays different categories of semantic tags (genre, emotion, instrument, voice quality, song and usage) at edges and corners.
Note that the tag size changes depending on the confidence level (each tag has a corresponding classifier and it returns the confidence level, i.e., distance from the decision boundary ). While the classifiers were trained with song-level features summarized from the whole songs, this real-time visualizer can summarize local features only up to current time. Thus, the predictions can be not correct initially but they will progressively become more accurate as the song is played more.