i spent this week discussing ideas for the project. initially i wanted to develop some software that automatically segmented an existing song and clustered the segments based on spectral (and perhaps other) features. the user could then play back the song using only segments from the selected clusters. i spent the majority of the week working on my real-time concatenative synthesis project, sour mash, to prepare it for the open house.
after CCRMA's open house, at bruno's suggestion, i checked out some of diemo schwarz's work. in 2006 he wrote a survey of concatenative synthesis methods. it turns out that not many systems attempted to use large databases in realtime. i read through some papers that diemo schwarz discussed that attempted to build large databases for real-time concatenative synthesis.
i re-read several papers by matt hoffman about his FeatSynth framework for resynthesis. his work focuses on selecting synthesizer parameters from a database. the resulting synthesized audio should minimize the distance to an input feature-vector. he discussed using locality sensitive hashing for storing and querying the database. i went through the code that he used to build up the hash table.
i also checked out some work by malcolm slaney on locality sensitive hashing in the context of storing and querying audio.
here's a clip of diemo schwarz performing live with CataRT
i refactored my sour mash code and began analyzing 500 audio tracks. the goal of this build is to create a densely populated feature space. i then plan on resynthesizing some songs offline, to see how many entries in the database may be needed, and to see how compelling the system is with only 5 mfcc dimensions.
at the hearing seminar on friday, 4/13, philippe hamel talked about new MIR features. he was critical of MFCCs and proposed a new feature, principal mel-spectrum coefficients (PMSCs). he showed a demo from dan ellis that showed how bad MFCCs are at reconstructing audio. i probably should incorporate chromagrams as well.
the database finished populating. i'm not sure how many audio segments are in it, but the number is large enough that the sql query takes prohibitively long to find out. the database file is 771 mb and stores features for around 75 gb of audio (wav).
i got the resynthesizer 'working' with 5 MFCC dims. here's some short demos.
the resynthesized versions show some promise, but it's pretty clear that i need some chroma-type features. and always more data. my next steps are to continue populating the database with more music. implement some sort of chroma feature and replace the 5th mfcc dimension with chroma. and finally, start transitioning over to using LSH.