In [27], the partial trajectories are used in frequency warping to get rid of the FM component which then allows AM component to be extracted by simple lowpass filtering. These are then used to reconstuct the signal of interest. The warping not only eliminates the spectrum smearing for better envelope extraction, but also prevents cross-channel interference from other partials due to unmatched demodulation. However, the algorithm needs good initialization. At least, polyphonic pitch estimation is essential for the harmonic lock loop to do its job. How to group partials in the first place is not dealt with in this work but the result on vocal extraction was quite impressive. While this algorithm tracks the partials from sample to sample, other algorithms which iterate for estimates frame-by-frame using a non-stationary model of partials exist. For example, a quadratic chirp with EM algorithm used in iteration (Chazan 1993) [4].
The weaknesses of the CASA-based approaches involve the difficulties in
determining the acoustic cues even before grouping when a mixture is
complex. Frequency tracking and parameter estimation become much harder
when the sources' spectra are highly overlapping. In addition, to
mathematically group partials into sources are computationally intensive
and prone to errors when parameter estimations are not accurate in the
first place. Its strength lies in the flexibility to time-frequency
modification and reconstruction once parameters estimations and grouping
have been executed correctly. Illusory reconstruction can also be
incorporated into the model as in . Its principal
ideas are nevertheless highly overlapping with the auditory system modeling approach to be discussed next. The techniques adopted as part of the processing usually involve human auditory system model, making it overlapping with the artificial auditory neural network to be presented next.