In /usr/ccrma/media/databases/hiphop-gene/ are the following files:

Data Files

  • artists.json

A list of each artist in the dataset. Rather than extracted from tags in the mp3 file, they are hand-entered via to ensure correct normalization.

  • compressed

Loosely organized directory of mp3/m4a/etc. files for the base data set. New data set examples go in here, to be sorted out by

  • genres.json

List of each possible genre in the dataset. Handwritten and used by for manual genre entry.

  • meta.json

The main catalogue of metadata associated with each WAV file. Currently includes genre and artist(s) info, in addition to file paths of compressed/WAV versions of the audio data.

  • wav

Directory of uncompressed audio data files. Automatically populated by

Utility Tools


Convert files in compressed/ to WAV format, and place them in wav/.


Add any new artists in meta.json to artists.json (normally not necessary as should do this automatically).


Search for new files in compressed/<tt> and request genre and artist information. Stores this all in <tt>meta.json.


Export meta.json data into a format convenient for use in Matlab. Write filepaths to files.dat and genre + artist info to meta.dat. Each row in these files is one training example. Column 1 of meta.dat is the genre (an index into the list of genres in genres.json) and the subsequent columns indicate the presence of absence of a particular artist on that song (where column N is the N+1-th artist in artists.json).