Next: The Unified Framework
Up: Statistical Learning
Previous: Single Mixture Blind Source
Instead of claiming the disjoint in time, adding another dimension to the signal representation does indeed make
the notion of disjoint among sources more valid. In [17], the signal is represented by the windowed-STFT which disperses
signal energy across the time-frequency plane. Speech signals have been shown to be highly disjoint, with each STFT bin containing 95%
of the energy from one source only while the rest comes from the interences. However, in order to determine which source each
coefficients belong to, more information is needed. In this work, two microphones are needed to provide the spatial information. By
estimating the relative delay and attenuation for each STFT coefficient using Fourier theory, clustering technique is used to determine
the true delay and attenuation for each source to one of the microphones. A maximum likelihood approach to these estimations also exists
[18] but the principle is still the same. Each STFT coefficient is then assigned to the source corresponding to the
estiamted delay-attenuation by the means of clustering(using nearest neighbor, for example).
An algorithm for a separation of arbitrary number of sources from a stereo mixture proposed was proposed in [17]
[18]. The algorithm is based on the assumption that there is an energy contribution from only one source in each of the
coefficient of the STFT representation. The assumption was shown to be true on average with more than 90% of the energy in each
coefficient comes from only one source for a not so dense mixture. The result on speech was intelligible but rather poor quality with
distortion. But then, there were only two microphones.
The statistical learning approaches have done well in exploiting spatial
difference among the sources. Without it, as in the case of a
single-microphone mixture, the separation has been far less successful.
Nevertheless, there exist some one-channel source separation which give
fairly useful analysis tool for a better reconstruction and hence
separation of sources by low quality separation which in itself is not
sufficient for musical purpose, even though intelligible for speech
understanding purpose.
Next: The Unified Framework
Up: Statistical Learning
Previous: Single Mixture Blind Source
Pamornpol Jinachitra
Tue Jun 17 16:27:28 PDT 2003