Difference between revisions of "HipHop Genealogy"

From CCRMA Wiki
Jump to: navigation, search
(Utility Tools)
(Data Files)
Line 5: Line 5:
 
A list of each artist in the dataset. Rather than extracted from tags in the mp3 file, they are hand-entered via <tt>categorize.py</tt> to ensure correct normalization.  
 
A list of each artist in the dataset. Rather than extracted from tags in the mp3 file, they are hand-entered via <tt>categorize.py</tt> to ensure correct normalization.  
 
* compressed
 
* compressed
Loosely organized directory of mp3/m4a/etc. files for the base data set.  
+
Loosely organized directory of mp3/m4a/etc. files for the base data set. New data set examples go in here, to be sorted out by <tt>categorized.py</tt>.  
 
* genres.json
 
* genres.json
 
List of each possible genre in the dataset. Handwritten and used by <tt>categorize.py</tt> for manual genre entry.  
 
List of each possible genre in the dataset. Handwritten and used by <tt>categorize.py</tt> for manual genre entry.  

Revision as of 02:06, 10 November 2013

In /usr/ccrma/media/databases/hiphop-gene/ are the following files:

Data Files

  • artists.json

A list of each artist in the dataset. Rather than extracted from tags in the mp3 file, they are hand-entered via categorize.py to ensure correct normalization.

  • compressed

Loosely organized directory of mp3/m4a/etc. files for the base data set. New data set examples go in here, to be sorted out by categorized.py.

  • genres.json

List of each possible genre in the dataset. Handwritten and used by categorize.py for manual genre entry.

  • meta.json

The main catalogue of metadata associated with each WAV file. Currently includes genre and artist(s) info, in addition to file paths of compressed/WAV versions of the audio data.

  • wav

Directory of uncompressed audio data files. Automatically populated by decompress.py

Utility Tools

  • decompress.py

Convert files in compressed/ to WAV format, and place them in wav/.

  • build_artists.py

Add any new artists in meta.json to artists.json (normally not necessary as categorize.py should do this automatically).

  • categorize.py

Search for new files in compressed/<tt> and request genre and artist information. Stores this all in <tt>meta.json.

  • export_mat.py

Export meta.json data into a format convenient for use in Matlab.