next up previous
Next: Non-stationary model Up: Computational Auditory Scene Analysis Previous: General Scene Analysis

Phase incoherence

A mixture of vowels uttered simultaneously is heard with no discernable identity. However, McAdams (McAdams 1984) demonstrated that when one of the vowels in the mixture is slightly frequency modulated, it will jump out perceptually (McAdams 1984). It is then argued that a non-periodic variation in this modulation is what matters in separation. Such a variation is called jitter as opposed to sinusoidal frequecy modulation which does not give as natural sound to a synthetic singing voice (Perry Cook 1994) and it is speculated in this work that jitter is what the brain used to segregate one vowel sound from a mixture. The use of formant burst to detect the start of a vowel sound would have been simple if all bursts occur across all frequencies. In reality, dispersion via the vocal tract, room and ear response cause the time alignment to be unreliable. The presence of interference sources makes a possible solution of thresholding impossible. The idea was pursued by Dan Ellis in  [9].

A similar idea of jitter, or in other words, phase incoherence among the sources, is used in the work by Gert Cauwenberghs. In  [10], phase incoherence between independent sources is utilized for separation. The signal is modeled as consisting of periodic wavelets, each with time-varying amplitude fluctuation and time jitter. While the amplitude variation affects the spectrum across all frequencies, the time jitter results in the a non-uniform complex modulation in the frequency domain. By taking autocorrelation of segments, the best-fit jitter and amplitude fluctuation are found and the estimates of original sources are updated smoothly. The wavelet coefficients can also be found. The algorithm works under the assumption that the source is sufficiently quasi-periodic(with jitters and amplitude fluctuation) i.e. sufficiently coherent over a long period of time for fluctuations to be observable. The results show good separation of simple waveforms such as the two-frequency chirp and the AM quasi-periodic. It has not been reported how well it does with real mixtures.


next up previous
Next: Non-stationary model Up: Computational Auditory Scene Analysis Previous: General Scene Analysis

Pamornpol Jinachitra
Tue Jun 17 16:27:28 PDT 2003