next up previous
Next: Bayesian Framework Up: BAYESIAN TWO SOURCE MODELING Previous: Introduction

DUET and DASSS Review

We first review the DUET system [4,5,1] of Scott Rickard and other authors. The DUET system performs sound source separation of $N$ sources from two channels, where $N$ is in general greater than two. The DUET system assumes the following STFT domain linear mixing model for sources $S_i$ in left channel $X_1$ and right channel $X_2$:
$\displaystyle X_1$ $\textstyle =$ $\displaystyle S_1 + S_2 + \cdots + S_N$ (1)
$\displaystyle X_2$ $\textstyle =$ $\displaystyle a_1 e^{-j\omega\delta_1}S_1 + a_2 e^{-j\omega\delta_2}S_2 + \cdots + a_N e^{-j\omega\delta_N}S_N$ (2)

where $a_i$ represents the scale parameter and $\delta_i$ represents the delay parameter, each from the left to right channel, for some source $i$. We refer to $a_i$ and $\delta_i$ together as the mixing parameters for a given source $i$. By assuming that only one source at a time is active in time-frequency space - a near-realistic assumption for independent speech sources - we may estimate the mixing parameters for a particular time-frequency point via:
$\displaystyle (a_i,\delta_i)=
\left(\ensuremath{\frac{\vert X_2(\omega_k,\tau)\...
...{\frac{X_1(\omega_k,\tau)}{X_2(\omega_k,\tau)}}\right)\right\}/\omega_k\right).$     (3)

After collecting many such estimates, the DUET system prepares a two-dimensional histogram whose peaks in $(a_i,\delta_i)$ space should reveal the mixing parameters for each of the $N$ sources. To demix the sources, DUET considers the set of parameter estimates a second time after the source mixing parameters are estimated from the histogram. It then assigns each point in time-frequency space to the source whose mixing parameters are closest to that estimated for the time-frequency point. To do this, a variety of matching schemes may be used. We have presented delay and scale subtraction scoring (DASSS) [2], which is similar to a method presented recently by the original DUET authors in [1]. In DASSS, we define a set of functions $Y_i$ such that:
$\displaystyle Y_i$ $\textstyle \equiv$ $\displaystyle X_1 - \ensuremath{\frac{1}{a_i}}e^{+j\omega\delta_i} X_2$ (4)

and the mixing parameters are always treated as known quantities. If in fact exactly one source, $S_g$, is active at a given frequency bin in a given frame, it may be shown that our model predicts:
$\displaystyle \hat{Y}_{i=g}$ $\textstyle =$ $\displaystyle 0$ (5)
$\displaystyle \hat{Y}_{i\neq g}$ $\textstyle =$ $\displaystyle \alpha_{j,i} S_j$ (6)
  $\textstyle =$ $\displaystyle \alpha_{j,i} X_1.$ (7)

where
$\displaystyle \alpha_{u,v} \equiv (1-\ensuremath{\frac{a_v}{a_u}}e^{j\omega(\delta_u - \delta_v)}).$     (8)

We now observe that we may similarly predict the DASSS function values $Y_i$ when two sources $S_u$ and $S_v$ are active:
$\displaystyle \hat{Y}_{i=u}$ $\textstyle =$ $\displaystyle \alpha_{uv} S_v$ (9)
$\displaystyle \hat{Y}_{i=v}$ $\textstyle =$ $\displaystyle \alpha_{vu} S_u$ (10)
$\displaystyle \hat{Y}_{i \neq (u\vert v)}$ $\textstyle =$ $\displaystyle \alpha_{iu} S_u + \alpha_{iv} S_v$ (11)

We now make an important observation. If we know how $S_v$ and $S_u$ are distributed, we then know how $\hat{Y}_{i=u}$, $\hat{Y}_{i=v}$, and $\hat{Y}_{i \neq (u\vert v)}$ are distributed. (In general, we will see that distributions on $\vert S_u\vert$ and $\vert S_v\vert$ may be practically estimated from knowledge about a musical or speech source, such as its range and loudness. Distributions on $\angle
S_u$ and $\angle S_v$ are not informative, and thus we will use the set $\vert Y_i\vert$, $i\in \{1...N\}$ rather than the sets $Y_i$ or $\angle Y_i$ as our DASSS data.) Below, we will exploit our knowledge of the DASSS data in a Bayesian context to determine if (and which) two sources are most likely active. Much as we know how the DASSS data $Y_i$ functions will behave for the two source case, it may also be shown [6] that we can predict the values for the DUET data given by equation  3 in the same case. It is not practical, however, to exploit this data, as logistics and computation quickly become prohibitive [6]. We therefore focus our efforts on DASSS data below.
next up previous
Next: Bayesian Framework Up: BAYESIAN TWO SOURCE MODELING Previous: Introduction
Aaron S. Master 2003-10-30