Next: Bayesian Framework Up: BAYESIAN TWO SOURCE MODELING Previous: Introduction

DUET and DASSS Review

We first review the DUET system [4,5,1] of Scott Rickard and other authors. The DUET system performs sound source separation of

sources from two channels, where

is in general greater than two. The DUET system assumes the following STFT domain linear mixing model for sources

in left channel

and right channel

$\displaystyle X_1$	$\textstyle =$	$\displaystyle S_1 + S_2 + \cdots + S_N$	(1)
$\displaystyle X_2$	$\textstyle =$	$\displaystyle a_1 e^{-j\omega\delta_1}S_1 + a_2 e^{-j\omega\delta_2}S_2 + \cdots + a_N e^{-j\omega\delta_N}S_N$	(2)

where

represents the scale parameter and $\delta_i$ represents the delay parameter, each from the left to right channel, for some source

. We refer to

and $\delta_i$ together as the mixing parameters for a given source

. By assuming that only one source at a time is active in time-frequency space - a near-realistic assumption for independent speech sources - we may estimate the mixing parameters for a particular time-frequency point via:

$\displaystyle (a_i,\delta_i)= \left(\ensuremath{\frac{\vert X_2(\omega_k,\tau)\... ...{\frac{X_1(\omega_k,\tau)}{X_2(\omega_k,\tau)}}\right)\right\}/\omega_k\right).$

(3)

After collecting many such estimates, the DUET system prepares a two-dimensional histogram whose peaks in $(a_i,\delta_i)$ space should reveal the mixing parameters for each of the

sources. To demix the sources, DUET considers the set of parameter estimates a second time after the source mixing parameters are estimated from the histogram. It then assigns each point in time-frequency space to the source whose mixing parameters are closest to that estimated for the time-frequency point. To do this, a variety of matching schemes may be used. We have presented delay and scale subtraction scoring (DASSS) [2], which is similar to a method presented recently by the original DUET authors in [1]. In DASSS, we define a set of functions

such that:

$\displaystyle Y_i$

$\textstyle \equiv$

$\displaystyle X_1 - \ensuremath{\frac{1}{a_i}}e^{+j\omega\delta_i} X_2$

(4)

and the mixing parameters are always treated as known quantities. If in fact exactly one source,

, is active at a given frequency bin in a given frame, it may be shown that our model predicts:

$\displaystyle \hat{Y}_{i=g}$	$\textstyle =$	$\displaystyle 0$	(5)
$\displaystyle \hat{Y}_{i\neq g}$	$\textstyle =$	$\displaystyle \alpha_{j,i} S_j$	(6)
	$\textstyle =$	$\displaystyle \alpha_{j,i} X_1.$	(7)

where

$\displaystyle \alpha_{u,v} \equiv (1-\ensuremath{\frac{a_v}{a_u}}e^{j\omega(\delta_u - \delta_v)}).$

(8)

We now observe that we may similarly predict the DASSS function values

when two sources

and

are active:

$\displaystyle \hat{Y}_{i=u}$	$\textstyle =$	$\displaystyle \alpha_{uv} S_v$	(9)
$\displaystyle \hat{Y}_{i=v}$	$\textstyle =$	$\displaystyle \alpha_{vu} S_u$	(10)
$\displaystyle \hat{Y}_{i \neq (u\vert v)}$	$\textstyle =$	$\displaystyle \alpha_{iu} S_u + \alpha_{iv} S_v$	(11)

We now make an important observation. If we know how

and

are distributed, we then know how $\hat{Y}_{i=u}$ , $\hat{Y}_{i=v}$ , and $\hat{Y}_{i \neq (u\vert v)}$ are distributed. (In general, we will see that distributions on $\vert S_u\vert$ and $\vert S_v\vert$ may be practically estimated from knowledge about a musical or speech source, such as its range and loudness. Distributions on $\angle S_u$ and $\angle S_v$ are not informative, and thus we will use the set $\vert Y_i\vert$ , $i\in \{1...N\}$ rather than the sets

or $\angle Y_i$ as our DASSS data.) Below, we will exploit our knowledge of the DASSS data in a Bayesian context to determine if (and which) two sources are most likely active. Much as we know how the DASSS data

functions will behave for the two source case, it may also be shown [6] that we can predict the values for the DUET data given by equation 3 in the same case. It is not practical, however, to exploit this data, as logistics and computation quickly become prohibitive [6]. We therefore focus our efforts on DASSS data below.

Next: Bayesian Framework Up: BAYESIAN TWO SOURCE MODELING Previous: Introduction

Aaron S. Master 2003-10-30