Next  |  Prev  |  Up  |  Top  |  JOS Index  |  JOS Pubs  |  JOS Home  |  Search

Implemented Model for Tonality

The tonality of a frequency bin $\mathcal{F}(f)$ is estimated by looking at the predictability of the phase and magnitude of that Fourier coefficient. The predictors are defined as follows:
\begin{displaymath}
\hat{\varphi}_t = 2\cdot arg(\mathcal{F}_{t-1}(f)) - arg(\mathcal{F}_{t-2}(f)),
\ {\rm and}
\end{displaymath} (11)


\begin{displaymath}
\hat{M}_t = \vert\mathcal{F}_{t-1}(f)\vert.
\end{displaymath} (12)

The phase is thus linearly extrapolated from two former time instances, and magnitude is simply assumed to be the same as last. This gives no prediction error for one stationary sine within the frequency band. The tonality is then estimated from the maximum prediction error of the last two phase values and the last magnitude:
\begin{displaymath}
t(f) = 1-\max(\varphi_{err_t}, \varphi_{err_{t-1}}, M_{err_t}),
\end{displaymath} (13)

where $\varphi_{err_t} = \hat{\varphi}_t-arg(\mathcal{F}_t(f))/\pi$ and $M_{err_t} = (\hat{M}_t-\vert\mathcal{F}_t(f)\vert)/\max(\mathcal{F}_t(f),
\hat{M}_t)$. This model gives a weighted average $t$ of about 0.9 for highly tonal sting music, and 0.3 for white noise. Of course the parameters in the masking threshold (section 3.2.5) estimation is adapted to these (non-ideal) values.


Next  |  Prev  |  Up  |  Top  |  JOS Index  |  JOS Pubs  |  JOS Home  |  Search

Download bosse.pdf

``An Experimental High Fidelity Perceptual Audio Coder'', by Bosse Lincoln<bosse@ccrma.stanford.edu>, (Final Project, Music 420, Winter '97-'98).
Copyright © 2006-01-03 by Bosse Lincoln<bosse@ccrma.stanford.edu>
Center for Computer Research in Music and Acoustics (CCRMA),   Stanford University
CCRMA  [Automatic-links disclaimer]