Next  |  Prev  |  Up  |  Top  |  JOS Index  |  JOS Pubs  |  JOS Home  |  Search

Application of Loudness/Pitch/Timbre decomposition operators to auditory scene analysis

Mototsugu Abe $<$moto at ccrma$>$ (CCRMA Visiting Scholar
Sony Corporation, Japan)

In this presentation, I will present my former work at the University of Tokyo, which was completed in the mid 1990s and was a part of my doctoral thesis research.

The first half of the presentation is on "Loudness/Pitch/Timbre(*) Decomposition Operators." In this research, we constructed a set of operators as general audio signal processing tools in the time-frequency domain.

More concretely, we focus on the instantaneous change of audio in the Wavelet domain. The change is decomposed into three orthogonal components, and a method is given for projecting the change onto these components.

The second half is on an application of the operators to the problem of computational auditory scene analysis(*). In a (monaural) multi-stream sound(*), when frequency components of the sound change together in amplitude and frequency, they are grouped together as one auditory stream. Since the above operators provide us with a method for quantifying the instantaneous changes in amplitude and frequency, we utilize them for the initial stage. Then the estimated amplitude and frequency changes of the components are used to construct a probabilistic space in which peaks correspond to streams. The probability distribution may be updated with new data to follow streams through time.

Notes:

(*)
Though "loudness", "pitch" and "timbre" are terms corresponding to human perception, they are used in our context to mean "amplitude change", "frequency change" and "the other types of change", respectively. (At the time, my adviser wanted to somehow relate this work to human perception, but we haven't done that yet.)

(*)
The term "auditory scene analysis" is the title of a book written by Albert S. Bregman in 1990, in which he summarizes psychophysical characteristics of human perception in grouping/separating sounds which occur simultaneously or sequentially. In the 1990s, many researchers/engineers tried to simulate/implement them as a computational model on a computer.

(*)
The term "multi-stream sound" is almost the same as "multi-source sound". However, "stream" corresponds to human perception regardless of how many sources actually exist, whereas "source" corresponds to an actual physical sound source.


Next  |  Prev  |  Up  |  Top  |  JOS Index  |  JOS Pubs  |  JOS Home  |  Search

Download mus423h.pdf

``CCRMA DSP Seminar Prior Abstracts'', by Julius O. Smith III, Aut-Spr Quarters, CCRMA Ballroom, The Knoll, Stanford University.
Copyright © 2005-12-28 by Julius O. Smith III
Center for Computer Research in Music and Acoustics (CCRMA),   Stanford University
CCRMA  [Automatic-links disclaimer]