Evaluation of a score-informed source separation system (ISMIR 2010)

This paper, accepted at the ISMIR 2010 conference, presents a detailed evaluation of the source separation system that was presented in an earlier paper at ICMC (which can be accessed here). Download the ISMIR paper here.

Source alignment and separation test database

The database can be downloaded at the bottom of this page, but nevertheless, first some explanations.

Our proposed source separation system needs, next to a mix to separate, also a score (i.e. symbolic information of what is being played). From the score, we have access to individual parts, instruments, measures, or whatever we actually want to extract from the mix. If those parts are separately rendered through a synthesizer, we have an "ideal" version of what we want to extract. This version is aligned to the real recording and then used as prior information for the PLCA algorithm, which then can analyze the mix in such a way that different sets of components correspond to the different sources to be extracted.

It proved to be rather difficult to find a test database of polyphonic music and their corresponding scores - the problem being the scores. We decided to create our own, and constructed it as follows:

  • The data is synthesized from randomly generated MIDI files. This is done mainly to avoid copyright and to be able to keep things small , but also not to assume any music style or structure. For every MIDI instrument (MIDI program change), one file was generated. We did not generate any files with drum sounds from MIDI channel 10. Each file is exactly 10 seconds long and contains on average 20 notes, with duration of each note ranging from 0.1 to 1 seconds. The pitches are between MIDI pitch nr 36 and 96, the loudness varies from MIDI velocity 77 to 127. The source code of this generator is available.
  • Synthesized scores are different from real recordings. In order to simulate the difference, we have 2 copies of rendered MIDI files: one using the Timidity++ synthesizer (latest CVS version) with the Fluid R3 GM soundfont on Linux, and one using the DirectMusic synthesizer on Windows XP, with the files saved using WinAmp Media Player. In testing, one of these renderings then plays the role of a real recording, while the files rendered with the other synth can be used as synthesized data. You'll find that in a lot of cases, there is a huge difference in timbre between renderings on different synths of the same instrument.
  • A synthesized file will have a computed, almost robotic timing, while in any real recording, timing varies greatly and is a part of the artist's expressivity. In order to match up the recording with a score, alignment is an unavoidable part of our separation system. Alignment information can be given manually, but (semi-) automated approaches also exist. In order to take into account this, the set of files that we have were modified to have tempo changes inside them, in such a way that the total length remains at 10 seconds. For all files, there are MIDI versions and renderings of the files where tempo differs maximally 10%, 20%, 30%, 40% and 50% from the original, and changes a couple of times every 2 seconds. To keep things easy, the mapping from original to "wobbled" file is the same curve for every file. More information about how this is structured, is available in the readme file in the database.
  • There are 2 versions of the database: one contains all MIDI instruments (including whistles, clapping hands, helicopter sounds). Since the amount of possible mixes of 2 different sources out of 128 is 16256, for our ISMIR paper we worked with a subset of our database. We picked 20 instruments, which are generally present in orchestral and pop music. This reduced the amount of possible mixes to 380. Even then, all results presented in our ISMIR 2010 paper needed several nights on a dozen computers to be computed: they were averaged from repeated tests on this subset database in order to increase confidence of the results and eliminate outliers as much as possible.

The databases are available for download in zip format. They contain folders with MIDI files, audio files rendered with FluidSynth, and audio files rendered with DirectMusic. More detailed information is available in the readme file inside. Source code is included where possible.

Please contact me with any questions or comments!