... (Hz).2.1
Long ago, the term for Hz was cycles per second (cps).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
....2.2
More generally, an analytic signal is obtained from a real signal by filtering out its negative-frequency components. In other terms, the imaginary part of the analytic signal may be obtained as the Hilbert transform of the real part (see §B.5).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... property2.3
The sifting property of delta functions $ \delta(t)$ provides that

$\displaystyle \int_{t=-\infty}^{\infty}f(t)\delta(t) = f(0)
$

for every continuous function $ f(t)$. We think of a delta function as having zero width and unit area [243].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... frequency.2.4
This is not as confusing as one might think at first. When the frequency range is $ -\pi$ to $ \pi$, normalized radian frequency is being used (radians per sample). When the range is $ -1/2$ to $ 1/2$, it is normalized frequency (cycles per sample). The unnormalized case (true physical radian frequency in radians per second) usually only arises in applications.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... function|textbf:2.5
Note that writing $ \hbox{asinc}$ to denote the aliased sinc function is not standard practice in signal processing--consider it proposed notation.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
....2.6
In more detail, start with the ``physical'' definition of $ \hbox{asinc}$:

$\displaystyle \hbox{asinc}_M(\omega T) = \frac{\sin(M \omega T / 2)}{M \sin( \omega T/2)}
$

Now replace $ MT$ by $ \tau$ in the numerator. Take the limit as $ T$ goes to zero, with $ \tau$ remaining fixed, so that $ M$ goes to infinity (maintaining the relation $ M T = \tau$). When $ T$ gets very small, the denominator becomes

$\displaystyle M \sin( \omega T/2) \to M \omega T/2 = \omega \tau /2,
$

which completes the proof.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... crossings.2.7
$ \Omega_M$ is the radian-frequency sampling interval for a length $ M$ DFT. Using $ \Omega_M$ to denote the sampling interval along $ \omega$ is analogous to using $ T$ to denote the sampling interval along time $ t$ -- hence the choice of symbol $ \Omega$.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... moment.2.8
See §2.4.16 for an example of this regarding the uncertainty principle.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
....2.9
A standard notation for fundamental frequency is ``$ F0$'' (or $ F_0$). This comes from the speech analysis community, where usually $ F_1$, $ F_2$, and so on, refer to the formant frequencies (resonance peak frequencies) of the vocal tract. When not working with formants, it is convenient to define the fundamental frequency as $ f_1$, so that the frequency of the $ k$th harmonic is $ f_k = kf_1$.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... periodic.2.10
Most plucked strings can be considered very nearly harmonic. Piano strings, however, are significantly stiff so that they exhibit audible inharmonicity--the partial overtone series is stretched [68,244].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... exactly.2.11
One situation in which minimum orthogonal spacing works well is when the signal is known to be exactly periodic, and the period is accurately measured using a fundamental-frequency estimator (§10.6). In this case, we can resample the periodic signal to obtain an exact integer number of samples per period, and a rectangular window can be set to exactly one period in length. In this situation, each DFT coefficient is proportional to a Fourier series coefficient (defined in Chapter 2), and the peak frequencies are known to be integer multiples of the fundamental frequency, so no peak interpolation is needed at all. In other words, the fundamental frequency estimator takes care of locating all the peaks in frequency, and the resampling leads to spectral samples at each main-lobe peak.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... variable,3.1
Most of this chapter uses normalized frequency, i.e., the sampling rate equals $ f_s=1$ sample per second.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...MDFT.3.2
http://ccrma.stanford.edu/~jos/mdft/Fourier_Theorems.html
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... states3.3
Our notational convention is that the first subscripts of an operator such as $ \hbox{\sc Shift}$ are its parameters, as in $ \hbox{\sc Shift}_l(x)$, and the last subscript selects a particular sample of output, as in $ \hbox{\sc Shift}_{l,n}(x)$. If the last subscript is omitted, it ``returns'' an entire signal. Thus, $ \hbox{\sc Shift}_{l,n}(x)$ is a scalar while $ \hbox{\sc Shift}_l(x)$ is an entire signal defined over the integers. We also may use $ (\cdot)$ to denote all values of some index, e.g., $ \hbox{\sc Shift}_l(x)=\hbox{\sc Shift}_{l,(\cdot)}(x)$.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... cases.3.4
For definitions of the DFT, DTFT, FT, and FS, see Chapter 2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... transforms:3.5
See §8.3.1 for the discrete-time case.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... principle|textbf.3.6
The Heisenberg uncertainty principle in quantum physics applies to any dual properties of a particle. For example, the position and velocity of an electron are oft-cited as such duals. An electron is described, in quantum mechanics, by a probability wave packet. Therefore, the position of an electron in space can be defined as the midpoint of the amplitude envelope of its wave function; its velocity, on the other hand, is determined by the frequency of the wave packet. To accurately measure the frequency, the packet must be very long in space, to provide many cycles of oscillation under the envelope. But this means the location in space is relatively uncertain. In more precise mathematical terms, the probability wave function for velocity is proportional to the spatial Fourier transform of the probability wave for position. I.e., they are exact Fourier duals. The Heisenberg Uncertainty Principle is therefore a Fourier property of fundamental particles described by waves [53].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...MDFT,3.7
http://ccrma.stanford.edu/~jos/mdft/Cauchy_Schwarz_Inequality.html
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... filter.3.8
An allpass filter has unity gain and arbitrary delay at each frequency.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... Therefore,3.9
Technically, the Fourier transform of the unit step function $ u(t)$ does not exist, since $ \vert u(t)\vert^p$ is not integrable for any value of $ p$. However, its Laplace transform $ U(s) = 1/s$ does exist in the right-half $ s$ plane, and the limit as $ s\to j\omega$ is well behaved and can be taken as the definition of the Fourier transform. The same construction works for $ t\cdot u(t)$, $ t^2\cdot u(t)$, and so on.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... order3.10
We will say that a function $ W(\omega)$ is of order $ 1/\omega^{n+1}$ if there exists $ \omega _0$ and some positive constant $ M<\infty$ such that $ \left\vert W(\omega)\right\vert<M/w^{n+1}$ for all $ \omega > \omega_0$.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
....3.11
Such a decomposition may be constructed by differentiating to obtain $ w^\prime (t)$ and defining

$\displaystyle w^\prime_{\scriptscriptstyle\uparrow}(t)\isdef \left\{\begin{arra...
...
w^\prime , & w^\prime \geq 0 \\ [5pt]
0, & w^\prime <0 \\
\end{array}\right.
$

and similarly for $ w^\prime_{\scriptscriptstyle\downarrow}(t)$. (The derivatives may include impulses corresponding to discontinuities in $ w(t)$.)

The quantity $ [w_{\scriptscriptstyle\uparrow}(b)-w_{\scriptscriptstyle\uparrow}(a)] + [w_{\scriptscriptstyle\downarrow}(b)-w_{\scriptscriptstyle\downarrow}(a)]$ is called the total variation of $ w$ on $ (a,b)$; if this value is finite, then $ w(t)$ is said to be of bounded variation on $ (a,b)$.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...Harris78.4.1
The Hamming window can also be derived as a special case of windows having a maximized main-lobe peak $ W(0)$ over all windows of the same energy and prescribed first zero-crossings about the main lobe [186, p. 239,403].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...Harris78.4.2
Note that the $ -42.76$ dB figure is for large window-length $ M$. For small window lengths, the side-lobe levels increase. This phenomenon can be understood in terms of aliasing of the side-lobes of the continuous Hamming window which must be sampled to obtain a discrete-time window.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... Hann.4.3
The precise side-lobe level is dependent on window length $ M$, but $ -41$ to $ -42$ dB is typical for the Hamming window.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
....4.4
From a linear algebra point of view, consider the sinc kernel as corresponding to a Toeplitz Hermitian matrix. It is well known that Hermitian matrices have real eigenvalues and orthogonal eigenvectors. Also, multiplication by a Toeplitz matrix corresponds to convolution (in this case, a non-causal convolution.)
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... Box4.5
For Octave, the original version by Eric Breitenberger is still available on the Web, as of this writing, at
http://pangea.stanford.edu/Oceans/GES290/Breitenberger-SSAMatlab/mtm/.
Note, however, that the calling arguments after the first two are differently defined. A simple version written by the author appears in §F.1.2.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... kind:4.6
The Maclaurin series for $ I_0(x)$ can be obtained as the term-by-term square of that for $ \exp(x/2)$, since

$\displaystyle e^{\frac{x}{2}} = \sum_{k=0}^{\infty}\frac{\left(\frac{x}{2}\right)^k}{k!}.
$

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... radians-per-second).4.7
In [88], $ \beta = \pi\alpha$ is described as half of the time-bandwidth product, which in turn is not defined. Factors of 2 often come and go because, e.g., the frequency band $ [-\omega_c,\omega_c]$ is often considered a bandwith of $ \omega_c$ (neglecting negative frequencies).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
....4.8
The causal version may be computed as the inverse DFT of $ (-1)^k W(\omega_k)$.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... itself:4.9
The not-so-smooth function $ 1/\sqrt{\left\vert t\right\vert}$ also transforms to itself [133, p. 47]. Also, a periodic impulse train transforms to an impulse train with reciprocal spacing of the impulses.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... interpolation.5.1
Conversely, the DTFT of a time-limited signal can be sampled to convert it to a DFT with no loss of information. If the sampling density is not sufficiently high, there will be aliasing (wrap-around) in the time domain. This is an exact Fourier dual of sampling in the time domain [247].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... practice.5.2
Nevertheless, a perceptually exact band-limited interpolation can be implemented at a reasonable cost in the time domain, provided some amount of oversampling is used. Oversampling in the time domain provides a guard band in the frequency domain, which enables the interpolation kernel to meet perceptually ideal specifications at a much smaller length [247].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...spectsamps.5.3
In particular, zero-padding does not increase the resolution of an FFT. This is a surprisingly common point of misunderstanding (or sometimes just mislabeling).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... work.5.4
One could say the Blackman window is well matched to ``analog synthesizer quality'' levels, where a 60 dB signal-to-noise ratio is common.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... bins5.5
Here we mean fractional bins when $ p$ is not an integer.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... applications.5.6
A tuning error of $ 0.1$% is about ``two cents'', where a cent is defined as a hundredth of a semitone, or $ 2^{1/1200}-1 \approx 0.058\%$. Most people cannot detect tuning errors of only two cents, unless some kind of interference effect is involved, in which case the frequency error translates to a slowly modulated amplitude envelope.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... distributed|textbf5.7
The Gaussian distribution is also called the normal distribution, or ``bell curve.'' By the ``central limit theorem,'' any sum of independent random variables becomes Gaussian in the limit. Therefore, filtered noise is usually well modeled as Gaussian, since the filtering typically adds many random variables together. See Appendix C for more about the Gaussian distribution.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... by5.8
The real and imaginary parts of $ v(n) = v_r + j v_i$ are independently distributed according to the more familiar Gaussian density function

$\displaystyle p_{v_r}(\nu)=p_{v_i}(\nu)
= \frac{1}{\sqrt{\pi 2\sigma_{v_r}^2}} e^{-\frac{\nu^2}{2\sigma_{v_r}^2}}
$

where $ \sigma_{v_r}^2 = \sigma_{v_i}^2 = \sigma_{v}^2/2$.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... autocorrelation,6.1
Note that there are many possible biased estimates of the true autocorrelation function. However, we will consider only one of them in this book.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... ``complicated''.6.2
An interesting discussion of the meaning of randomness is given in Knuth [115, vol. II].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... function.6.3
In Octave, it is necessary to install the add-on package octave-forge to obtain this and other signal processing functions.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
....6.4
Note that we are assuming $ v$ is zero mean. Otherwise, the sample variance would be defined with the mean subtracted out, as discussed further in §D.1.10. When the mean is zero, a correlation may be called a covariance.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... data.6.5
A trend is typically estimated using linear regression. That is, a straight line is fit through the data in a least squares sense. (See the function polyfit in Matlab or Octave.)
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...Kay88:6.6
The division by $ M$ can be interpreted as normalizing the peak of the implicit Bartlett window on the autocorrelation function to 1, as discussed further below. Alternatively, it may be interpreted as a normalization of the Fourier transform itself, converting a power spectrum (squared-magnitude FFT) to a power spectral density. Such normalization is necessary for stationary random processes since they generally have infinite signal energy (but finite average power); i.e., $ \left\vert\hbox{\sc DTFT}(x_w)\right\vert^2$ grows without bound as $ M$ increases.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... correlated.6.7
An exception is when white noise is filtered using an allpass filter, in which case the output signal is still white noise.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... noise|textbf.6.8
For a more formal development, see the Wold decomposition theorem.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... noise6.9
The term ``pink noise'' indicates that the spectrum is more intense at low frequencies than at high frequencies. This makes sense since the color pink is heavier in the red end of the spectrum compared with white which balances all colors equally.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...VossAndClarke78,6.10
Physical phenomena exhibiting a $ 1/f$ power spectral density law include noise in vacuum tubes, carbon resistors, transistor junctions, metal films, ionic solutions, films at the superconducting transition, Josephson junctions, nerve membranes, cosmic background radiation distribution, sunspot activity, and the flood levels of the river Nile [270]. In addition, the short-time power fluctuations in music (below 20Hz) have been shown to follow the $ 1/f$ characteristic, especially classical music [270].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...Abel04.6.11
See, e.g., http://www.uaudio.com/.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... correctly.7.1
In the Matlab Signal Processing Tool Box, the argument 'periodic' should be included when creating the window.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... integer.7.2
Actually, non-integer $ R/k$ can be accommodated by rotating among a set of windows obtained by sampling the underlying continuous window at different phases.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... limited.7.3
This is of course the Fourier dual of saying that the uniform sampling of a time-domain signal is information-preserving provided the signal is properly bandlimited (in the frequency domain).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... magnitude.7.4
The spectrogram is often called a sonogram when applied to audio signals
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... STFT.7.5
Perfect reconstruction is also possible in principle using $ R$ as large as $ M$ with the Hamming window. However, this requires dividing out the amplitude modulation given by the sum of Hamming windows displaced by $ R$ (see Eq.$ \,$(6.2)). In practice, $ R=(M-1)/2$ (50% overlap) is the largest hop size used with the Hamming window because it is the largest value that preserves the constant-overlap-add (COLA) property. We will learn in Chapter 9 that $ R=(M-1)/4$ (75% overlap) is significantly more robust than 50% overlap, and is recommended when spectral modifications are to be carried out on the STFT data.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... cochlea.7.6
See http://www.blackwellscience.com/matthews/ear.html for an animated tutorial.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... SPL.7.7
A listening-level slider would be nice to have in the Graphical User Interface (GUI) for a loudness spectrogram.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... bank.7.8
Note that the FFTs are effectively downsampled by this operation, with the highest ``frequency-domain sampling rate'' occurring at the lowest frequency of the band. Therefore, the FFT length can be set by matching the adjacent auditory filter spacing to the low-frequency bin spacing of the FFT at the lower edge of the frequency range covered by that FFT). In fact, one very large FFT could be used in which the low-frequency bin spacing is approximately equal to the spacing of the center-frequencies of the auditory filter-bank channels at the low-frequency extreme.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... level.7.9
Downloading http://ccrma.stanford.edu/~jos/sasp/hw/SteveJobsHi.wav and listening at a very low level (approximately 20 dB SPL) verifies that indeed this sound example sounds like ``Hi...ee-jah,'' in qualitative agreement with the sone loudness curve in Fig.6.8.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... constant.7.10
Envelope followers in sound processing classically behave this way as well [95]. The amplitude envelope is allowed to increase instantaneously, but it floats down with some time constant that can be adjusted.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... hearing:8.1
Due to nonlinearities in hearing [163,278], it is not always valid to truncate the summation at the high-frequency hearing limit. For complete generally, $ \Omega$ should be extended to the highest frequency present in the signal $ s(t)$, since inaudible frequencies can give rise to audible components at the output of a nonlinearity.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... overtone.8.2
The term overtone or partial overtone is generally used to mean a sinusoidal component which is not harmonically related to the fundamental frequency.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... domain.8.3
Dolson and Laroche have extended this idea to the processing of nonparametric spectral peaks in the short-time spectrum [126,125,123].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... filter.9.1
As discussed in [240, Chapter 11], an FIR filter having impulse response $ h(n)$ is said to be linear phase when its impulse response is symmetric about some point in time, e.g., $ h(N-n-1)=h(n)$, for $ n=0:N-1$, where $ N$ is the length of the FIR filter.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... (COLA)9.2
The acronym COLA is not standard in signal processing, although OLA might be recognized by many. When writing a paper, acronyms should always be spelled out on first use, even for surely recognized acronyms such as ``FFT''.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... filter|textbf.10.1
In ordinary sampling theory [247], each sample of a time-domain signal determines the scaling and location of a sinc function for all time in the underlying continuous-time signal represented by the samples. The dc sampling filter described here is the Fourier dual of the time-domain sinc function corresponding to a single sample in time.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... demodulation|textbf.10.2
We use the term ``demodulation'' when frequencies are translated from high to low ($ \omega_c$ to 0 in this case).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...).10.3
We also implicitly assumed that the DFT size $ N$ was not smaller than the window length $ M$.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... modifications.10.4
The term FBS modifications refers to changing the gain and/or phase of the time-domain signal coming out of a filter-bank channel. This is distinct from OLA modifications in which a spectrum is altered, inverse transformed, and overlap-added into an output buffer. Multiplicative OLA modifications are exact (no aliasing) when the zero-padding in the time domain is sufficient. FBS modifications are not provided zero-padding in the time domain, and for $ R>1$ there is aliasing in the channel signals.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... spectrum.10.5
For some background info, see
http://www.geofex.com/Article_Folders/wahpedl/wahped.htm and
http://www.geofex.com/Article_Folders/wahpedl/voicewah.htm.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... speech.10.6
Search for ``Fant vowel diagram'' on the Web, or see the vowel diagram at the second URL in the previous footnote.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...11.1
The original vocoder used a ``buzz source'' driving the filter bank during voiced speech, and a ``hiss source'' driving the filter bank during unvoiced speech. In speech modeling by linear-prediction [147], the buzz source is classically an impulse train, and the hiss source is white noise. In additive synthesis7.1.4), each harmonic overtone, or overtone group, is synthesized using some form of wavetable oscillator. The additive synthesis approach generalizes readily to inharmonic spectra.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...PARSHL11.2
PARSHL was so named because it could follow partials (as opposed to merely harmonics). Being written for the PDP-10 computer running the SAIL operating system, the filename was restricted to 6 characters, so that ``partial'' became ``PARSHL''.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... phase11.3
The version written in 1985 did not support phase. Phase support was added much later by the second author of [248] in the context of his Ph.D. research, using the phase interpolation algorithm of McAulay and Quatieri [158].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... forever,11.4
We tried reusing turned-off oscillators but found them to be more trouble than they were worth in our environment.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... band11.5
See Appendix E for a definition of Bark bands (classical critical bands of hearing).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...KlapuriMohonk05,KlapuriSAP03,Klapuri01.11.6
Klapuri's publication home page: http://www.cs.tut.fi/~klap/iiro/
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... transients.12.1
The Dolby AC-3 perceptual audio coding format, which is formulated more directly as a transform coder (quantized STFT), switches to a shorter FFT window when transients are detected in the signal being encoded. The original Dolby AC-2 format used length 512 FFT windows in a Princen-Bradley time-domain aliasing cancellation scheme (sampling rate typically 44.1 kHz). The shorter length for transients in AC-3 was chosen to be 256 samples, or half the steady-state length [132, §4.1.4]. A special hybrid window is needed for a smooth transition from steady-state to transient processing, or vice versa.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... level.12.2
One careful study found that 96-kbps AAC is roughly equivalent to 128-kbps MP3, which is a 33% lower bitrate at roughly the same quality level. [132, §4.1.8].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... modifications12.3
We distinguish here between multiplicative spectral modification in overlap-add (frequency-domain convolution) and modifications introduced as gains applied to the filter-bank channel signals prior to remodulation and summing to reconstruct the signal. (The channel gains may be time-varying complex numbers.) All overlap-add systems with sufficient zero-padding will yield perfect reconstruction in the presents of multiplicative spectral modifications, as discussed in Chapter 8, even when their filter-bank interpretation obviously involves aliasing cancellation between channels in the frequency domain. On the other hand, filter-bank modifications, being in the time domain, do not support the overlap-add of multiple ``temporary time axes'' as appear in the OLA case, and perfect filter-bank reconstruction relies upon aliasing cancellation.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... frequencies.B.1
In this book, unless specified otherwise, all frequencies are normalized by the sampling rate. Thus, $ f_c\in(-1/2,1/2)$ is physically ``cycles per sample.''
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... gains.B.2
While cubic splines are maximally smooth in a precise physical sense, they are not band-limited, so one can do better by using band-limited interpolation of the desired frequency-response points.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... design.B.3
In this context, non-parametric means a design given by the inverse DFT of the sampled, desired frequency response. An example of a parametric filter design method is linear predictive coding (LPC).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... aliasing.B.4
More generally, any non-linear function of an FIR frequency response can be expected to correspond to an infinitely long impulse response (IIR) in the time domain. This can be shown by expressing the nonlinear modification as an infinite power series, and noting that each term in the power series corresponds to an iterated convolution. See the topic of Volterra series expansions of nonlinear systems for more on this point.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...diffthm). C.1
This approach to the proof was discovered on the Web, for the real case, at
http://www.ph.tn.tudelft.nl/~lucas/education/tn254/2002/Fourier%20transform%20of%20a%20Gaussian.pdf
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... distributionC.2
A PDF is one type of probability distribution in which probability is distributed continuously over a range of values, as in the probability of any given temperature on a given day. The probability of any particular temperature (specified with infinite precision) is zero, but ranges of temperatures have nonzero probability. In contrast to PDFs are discrete probability distributions, in which nonzero probability is assigned to specific numbers. An example of a discrete distribution is the probability ($ 1/2$) of heads or tails in a coin toss. The term ``distribution'' may refer to a discrete distribution, a PDF, or a mixure of the two. See Appendix D for a beginning introduction to statistical signal processing.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... function.D.1
The impulse function $ \delta(x)$ may be defined as any function $ f$ for which

$\displaystyle \int_{-\infty}^\infty f(x)\delta(x)dx = f(0)
$

where $ f(x)$ is assumed continuous at $ x=0$. A typical definition is

$\displaystyle \delta(x) \isdef \lim_{\Delta \to 0} \left\{\begin{array}{ll}
\fr...
...}, & 0\leq x\leq \Delta \\ [5pt]
0, & \hbox{otherwise}. \\
\end{array}\right.
$

The impulse was introduced in Chapter 2 starting at §2.4.9. See also [243,31,133] for further development.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... length.D.2
Note that 20 ms contains only one period of a sinusuoid at 50 Hz, which is above lower limit of pitch perception (the low note of the piano, A0, is tuned to 22 Hz). It is therefore possible to encounter difficulty resolving tones in the deep bass region of the audio spectrum. A 20 ms frame length works quite well, however, for telephone speech processing, in which the nominal bandwidth is 200-3200 Hz; in this case, a 20ms frame has at least four periods of the lowest frequency present, and harmonic resolution is assured under the Hamming window. In wideband audio work, a multiresolution analysis is often highly preferable.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... numbers.D.3
Two random events $ A$ and $ B$ are said to be independent if the probability of event $ A$ and $ B$ occurring together equals the product of the probability of event $ A$ times the probability of event $ B$. Similarly, two random variables $ x$ and $ y$ are said to be independent if the probability that both $ x=a$ and $ y=b$ equals the probability that $ x=a$ times the probability that $ y=b$, where $ a$ and $ b$ are any values that the respective random variables can assume. For purposes of this book, it is sufficient to have only an intuitive understanding of terms such as these from probability theory. Only sample correlations will be needed for noise spectrum analysis.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... planeE.1
Note that the image of the conformal map corresponds to the domain variable $ \zeta $ of the allpass transformation, while the input of the map corresponds to the range variable $ z$.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... scaleE.2
The Bark scale is reviewed in §E.5 below.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... desired.E.3
In general, the unit circle is mapped once to itself by any allpass transformation for which the number of poles $ N_p$ minus the number of zeros $ N_z$ inside the unit circle is $ \vert N_p-N_z\vert=1$. Therefore, higher order allpass transfer functions can be used having $ N_p$ poles inside the unit circle, say, and $ N_z = N_p\pm 1$ poles outside the unit circle. However, such a transformation cannot be used for audio digital filter design, our principle application, because it results in an unstable final filter $ H^*[{\cal A}_{-\rho }(z)]$. It similarly cannot be used in any applications requiring time-domain implementation of the unstable allpass filter in place of a unit delay element.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... BarksE.4
The normalized warped-frequency interval $ \omega\in[0,\pi]$ was converted to Barks $ b$ by the affine transformation $ b = (\omega/\pi)*(N_b-1)+0.5$, where $ N_b$ is the number of Bark bands in use. For example, $ N_b=25$ for a $ 31$ kHz sampling rate.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...JOST:E.5
Matlab functions bark2lin.m and lin2bark.m for transforming between linear and bark-warped frequency representations are available on the internet at http://ccrma.stanford.edu/~jos/bbt/bbt.html.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... matrix.F.1
There are at least two (free) add-on packages for Octave implementing specgram.m, the ``Matcompat'' and ``Octave-Forge'' packages. However, inexplicably (and inexcusably), the Octave-Forge version returns one FFT per row in the output matrix, while the Matlab version and Matcompat version return one FFT per column. Octave routines should always be written to be compatible with Matlab syntax when possible. Developers: If you ``improve'' the API for a pre-existing function, please pick a new name!
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.