next up previous
Next: Computational Auditory Scene Analysis Up: No Title Previous: What is it?

Why separate?

In general speech application, the motivation for source separation is driven by a realistic environment speech recognition where interfering sounds are likely to be more than just noise-like. In the realm of machine intelligence, auditory scene analysis and music understanding are applications which will benefit from source separation. Though total separation is clearly not necessary for these applications  [38], it is sometimes desired for applications such as universal karaoke making and speech input to a recognition system. Especially, parametric modeling will then be possible, or maybe is a by-product of the separation, making a flexible signal time-frequency modification possible. Simpler control parameter estimation for the purpose of physical modeling synthesis also follows. However, it needs to be pointed out that the problem is more like a vicious circle. If we can directly estimate parameters from the music recording, a separation can be obtained by resynthesis or re-rendering. On the other hand, if we can separate the sound components cleanly from the beginning, parameter estimations will be simpler. Both approaches are, unfortunately, rather equally difficult.

A separation of singing voice is a particularly interesting problem given its application in karaoke making and a pre-processing for speech recognition and lyric transcription. Furthermore, indexing and retrieval is another potential application since it has been experimentaly found that songs from a particular band are better recognized by the singer's voice than the background music which higly vary. Similarly, a tune is often more recognizable from the vocal notes. Signing voice separation can then act as a pre-processing to music database indexing for retrieval by humming etc. Its obvious perceptual difference from other musical instrument is also an attractive attribute which should gives us some clues on how to detect and separate the singing voice. However, these supposed differences are not mathematically obvious when compared to human perception. Human does well in segregating human voice from other musical instruments because of some of its undescribable unique features and probably, partly, though minor, because they form words we recognize.


next up previous
Next: Computational Auditory Scene Analysis Up: No Title Previous: What is it?

Pamornpol Jinachitra
Tue Jun 17 16:27:28 PDT 2003