In this presentation, I will present my former work at the University of Tokyo, which was completed in the mid 1990s and was a part of my doctoral thesis research.
The first half of the presentation is on "Loudness/Pitch/Timbre(*) Decomposition Operators." In this research, we constructed a set of operators as general audio signal processing tools in the time-frequency domain.
More concretely, we focus on the instantaneous change of audio in the Wavelet domain. The change is decomposed into three orthogonal components, and a method is given for projecting the change onto these components.
The second half is on an application of the operators to the problem of computational auditory scene analysis(*). In a (monaural) multi-stream sound(*), when frequency components of the sound change together in amplitude and frequency, they are grouped together as one auditory stream. Since the above operators provide us with a method for quantifying the instantaneous changes in amplitude and frequency, we utilize them for the initial stage. Then the estimated amplitude and frequency changes of the components are used to construct a probabilistic space in which peaks correspond to streams. The probability distribution may be updated with new data to follow streams through time.
Notes: