Source Separation Examples

Source Separation Examples Page

Welcome! Below please find examples referenced in my dissertation, ``Stereo Music Source Separation via Bayesian Modeling.’’ Sound files for each example case are located within their own directory (links provided below).

Naming Conventions:

All sound files carry the .wav extension and follow these labeling conventions:

s1, s2, s3, s4: original (mono) source signals

x1x2: stereo mixture signal (x1 and x2 are just the mono left and right channel signals)

s1_est, s2_est, s3_est, s4_est: estimates produced by the Bayesian EMV system

test1bin, test2bin, test3bin, test4bin: estimates produced by a binary system

In the dissertation, we describe the use of input mixture phase and perfect target phase to enhance automatic evaluation of results. We have also included the following files for some of the examples to allow an auditory comparison:

s1mp, s2mp, s3mp, s4mp: original source files, but using STFT phase from their mixture signals

s1pp_est, s2pp_est, s3pp_est, s4pp_est: estimates produced by the Bayesian EMV system, but given perfect source STFT phase ``for free.’’

test1binpp, test2binpp, test3binpp, test4binpp: estimates produced by a binary system, but given perfect source STFT phase ``for free.’’

Examples:

Latino 1 Examples: Closely spaced sources: Bass (left), Drums (center left), Guitar (center right), and Keyboard (right)

(reference: E. Vincent, R. Gribonval, C. Fevotte and al.. BASS-dB: the Blind Audio Source Separation evaluation database. URL: http://www.irisa.fr/metiss/BASS-dB/)

Varying number of sources and spacing:

- Two source version, cross-training, full Bayesian system: Bayesian EMV and binary results (no plots)

- Three source version, cross-training, full Bayesian system: Bayesian EMV and binary results

- Three source version, cross-training using four source system functions, full Bayesian system: Bayesian EMV and binary results

- Four source version, cross-training, full Bayesian system (also below): Bayesian EMV and binary system results.

- Four source (but widely spaced as in “Groove” example below) version, cross-training, full Bayesian system: Bayesian EMV and binary system results.

Varying training method:

- Four source version, auto-training, full Bayesian system: Bayesian EMV and binary system results.

- Four source version, cross-training, full Bayesian system (also above): Bayesian EMV and binary system results.

- Four source version, bootstrapping training, full Bayesian system: Bayesian EMV results.

- Four source version, generic signal training, full Bayesian system: Bayesian EMV results.

Varying system implementation:

- Four source version, cross-training, panning-only Bayesian system: Bayesin EMV results.

- Four source version, cross-training, panning and phase offset Bayesian system: Bayesian EMV results.

- Four source version, cross-training, Bayesian system with only two-source combinations permitted: Bayesian EMV results.

- Four source version, cross-training, full Bayesian system (also above): Bayesian EMV and binary system results.

Groove Example: Widely spaced sources: bass (far left), distortion guitar (center left), clean guitar (center right), and drums (far right)

- Four source version, cross-training, full Bayesian system: Bayesian EMV and binary system results.

Beatles Example: Yellow Submarine (excerpt): Guitar and most drums (left); bass and kick drum (center right); “wave” sound effect (center right); vocals (far right)

- Four source version, cross-training, full Bayesian system: Bayesian EMV and binary system results.

Beatles Karaoke Example: Yellow Submarine (excerpt): Various files

- Basic Karaoke: the mono signal for “standard” weighted subtraction Karaoke

- Binary Karaoke: uses a binary system to eliminate the vocals but retain stereo data

- LarocheLikeKaraoke: uses the input phase, but discards original panning data.

(Similar to process described in: Laroche, “Process for Removing Voice from Stereo Recordings,” US Patent 6405163, 6/2002.)

- ProposedKaraoke: uses the input panning but weighted subtraction phase.

- x1x2: the input mixture (as always)