Masking

Next: The Unified Framework Up: Statistical Learning Previous: Single Mixture Blind Source

Masking

Instead of claiming the disjoint in time, adding another dimension to the signal representation does indeed make the notion of disjoint among sources more valid. In [17], the signal is represented by the windowed-STFT which disperses signal energy across the time-frequency plane. Speech signals have been shown to be highly disjoint, with each STFT bin containing 95% of the energy from one source only while the rest comes from the interences. However, in order to determine which source each coefficients belong to, more information is needed. In this work, two microphones are needed to provide the spatial information. By estimating the relative delay and attenuation for each STFT coefficient using Fourier theory, clustering technique is used to determine the true delay and attenuation for each source to one of the microphones. A maximum likelihood approach to these estimations also exists [18] but the principle is still the same. Each STFT coefficient is then assigned to the source corresponding to the estiamted delay-attenuation by the means of clustering(using nearest neighbor, for example).

An algorithm for a separation of arbitrary number of sources from a stereo mixture proposed was proposed in [17] [18]. The algorithm is based on the assumption that there is an energy contribution from only one source in each of the coefficient of the STFT representation. The assumption was shown to be true on average with more than 90% of the energy in each coefficient comes from only one source for a not so dense mixture. The result on speech was intelligible but rather poor quality with distortion. But then, there were only two microphones.

The statistical learning approaches have done well in exploiting spatial difference among the sources. Without it, as in the case of a single-microphone mixture, the separation has been far less successful. Nevertheless, there exist some one-channel source separation which give fairly useful analysis tool for a better reconstruction and hence separation of sources by low quality separation which in itself is not sufficient for musical purpose, even though intelligible for speech understanding purpose.

Next: The Unified Framework Up: Statistical Learning Previous: Single Mixture Blind Source

Pamornpol Jinachitra
Tue Jun 17 16:27:28 PDT 2003