HipHop Genealogy

In /usr/ccrma/media/databases/hiphop-gene/ are the following files:

Data Files

artists.json

A list of each artist in the dataset. Rather than extracted from tags in the mp3 file, they are hand-entered via categorize.py to ensure correct normalization.

compressed

Loosely organized directory of mp3/m4a/etc. files for the base data set. New data set examples go in here, to be sorted out by categorized.py.

genres.json

List of each possible genre in the dataset. Handwritten and used by categorize.py for manual genre entry.

meta.json

The main catalogue of metadata associated with each WAV file. Currently includes genre and artist(s) info, in addition to file paths of compressed/WAV versions of the audio data.

wav

Directory of uncompressed audio data files. Automatically populated by decompress.py

Utility Tools

decompress.py

Convert files in compressed/ to WAV format, and place them in wav/.

build_artists.py

Add any new artists in meta.json to artists.json (normally not necessary as categorize.py should do this automatically).

categorize.py

Search for new files in compressed/<tt> and request genre and artist information. Stores this all in <tt>meta.json.

export_matlab.py

Export meta.json data into a format convenient for use in Matlab. Write filepaths to files.dat and genre + artist info to meta.dat. Each row in these files is one training example. Column 1 of meta.dat is the genre (an index into the list of genres in genres.json) and the subsequent columns indicate the presence of absence of a particular artist on that song (where column N is the N+1-th artist in artists.json).

HipHop Genealogy

Data Files

Utility Tools

Navigation menu

Views

Personal tools

Navigation

Search

Tools