- ...
email
^{1.1} -
`http://www.w3k.org/cgi-bin/register.cgi?email_jos`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... FFT
^{2.1} - ``Fast Fourier
Transform'' -- fast algorithms for implementing the Discrete Fourier
Transform (DFT) [264].
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...
(DTFT).
^{3.1} - In practical situations we can only deal with
finite-duration signals, so really we always use the Discrete Fourier
Transform (DFT) [264]. Moreover, the DFT is typically
implemented using the split-radix Cooley-Tukey Fast Fourier Transform
(FFT), which requires the DFT length to be a power of 2. Thus, in
practice, we use an FFT, while for theoretical studies, the DTFT is
preferred.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...MDFT.
^{3.2} `http://ccrma.stanford.edu/~jos/mdft/Fourier_Theorems_DFT.html`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... variable,
^{3.3} - The symbol ``
'' means ``is
defined as'' or ``equals by definition.'' In conformity with typical
signal-processing literature, most of this chapter
uses
*normalized frequency*,*i.e.*, the sampling rate equals sample per second.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...MDFT.
^{3.4} `http://ccrma.stanford.edu/~jos/mdft/Fourier_Theorems.html`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... states
^{3.5} - Our
notational convention is that the first subscripts of an operator such
as
are its parameters, as in
, and the last
subscript selects a particular sample of output, as in
. If the last subscript is omitted, it ``returns''
an entire signal. Thus,
is a scalar while
is an entire signal defined over the integers. We
also may use
to denote all values of some index,
*e.g.*, .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... cases.
^{3.6} - For
definitions of the DFT, DTFT, FT, and FS, see Table 2.1
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...
^{3.7} - If the sampling density is not sufficiently high, there will
be
*aliasing*(wrap-around) in the time domain. DTFT sampling in the frequency domain is an exact*Fourier dual*of ordinary sampling in the time domain.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...MDFT:
^{3.8} -
`http://ccrma.stanford.edu/~jos/mdft/Zero_Padding.html`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...
practice.
^{3.9} - Nevertheless, as discussed in [266],
a
*perceptually exact*band-limited interpolation can be implemented at a reasonable cost in the time domain, provided some amount of*oversampling*is used. Oversampling in the time domain provides a*guard band*in the frequency domain, which enables the interpolation kernel to meet perceptually ideal specifications at a much smaller length [270].. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...spectsamps.
^{3.10} - In particular,
*zero-padding does not increase the resolution of an FFT.*This is a surprisingly common point of misunderstanding (or sometimes just mislabeling).. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... work.
^{3.11} - One could say the
Blackman window is well matched to ``analog synthesizer quality''
levels, where a 60 dB signal-to-noise ratio is common.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...
function|textbf:
^{4.1} - Note that writing
to denote the aliased
sinc function is not standard practice in signal processing--consider
it proposed notation.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ....
^{4.2} -
In more detail, start with the ``physical'' definition of
:
(4.9)

Now replace by in the numerator. Take the limit as goes to zero, with remaining fixed, so that goes to infinity (maintaining the relation ). When gets very small, the denominator becomes(4.10)

which completes the proof.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... crossings.
^{4.3} -
is the
radian-frequency sampling interval for a length
DFT. Using
to denote the sampling interval along
is analogous
to using
to denote the sampling interval along time
-- hence
the choice of symbol
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...Harris78.
^{4.4} - The Hamming window can also be derived as a
special case of windows having a maximized main-lobe peak
over
all windows of the same energy and prescribed first zero-crossings
about the main lobe
[202, p. 239,403].
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...Harris78.
^{4.5} - Note that the
dB figure is
for large window-length
. For small window lengths, the side-lobe
levels increase. This phenomenon can be understood in terms of
aliasing of the side-lobes of the continuous Hamming window which must
be sampled to obtain a discrete-time window.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... dB,
^{4.6} - For larger
window-length
, more than
dB side-lobe suppression can be
achieved, such as the
value cited in [101].
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... Hann
^{4.7} - The precise side-lobe level is dependent on window
length
, but
to
dB is typical for the Hamming
window.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...Papoulis.
^{4.8} - The proof that
(3.36) is maximized appears on p. 210 of [202].
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ....
^{4.9} - From a
*linear algebra*point of view, consider the sinc kernel as corresponding to a*Toeplitz Hermitian*matrix. It is well known that Hermitian matrices have real eigenvalues and orthogonal eigenvectors. Also, multiplication by a Toeplitz matrix corresponds to*convolution*(in this case, a non-causal convolution.). . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...
Box
^{4.10} - For Octave, the original version by Eric Breitenberger is
still available on the Web, as of this writing, at
`http://pangea.stanford.edu/Oceans/GES290/Breitenberger-SSAMatlab/mtm/`.

Note, however, that the calling arguments after the first two are differently defined. A simple version written by the author appears in §F.1.2.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... by
^{4.11} - For small
and/or
, the closed-form transform diverges from the DTFT of the
discrete-time Kaiser window due to aliasing.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... kind:
^{4.12} - The Maclaurin series for
can be obtained as
the term-by-term square of that for
, since
(4.41)

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... radians-per-second).
^{4.13} - In
[101],
is described as
*half*of the time-bandwidth product, which in turn is not defined. Factors of 2 often come and go because,*e.g.*, the frequency band is often considered a bandwidth of (neglecting negative frequencies).. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ....
^{4.14} - The causal version may be
computed as the inverse DFT of
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...
itself.
^{4.15} - The not-so-smooth function
also
transforms to itself
[150, p. 47]. Also, a periodic impulse train transforms
to an impulse train with reciprocal spacing of the impulses
(see §B.14). Finally, as discussed at
`http://www.dsprelated.com/showarticle/45.php`, if is real and even, then is its own transform under the*normalized*Fourier transform. (The normalized Fourier transform is the usual Fourier transform divided by , which results in the inverse transform having the same scale factor.) That is, . We used the fact that if is real and even, then so is . (This is shown for the DTFT case in §2.3.3, and the proof is analogous in the continuous-time case.) Note that for to be smooth (differentiable of all orders), it cannot be bandlimited in either time or frequency.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... design.
^{4.16} - A spectrum-analysis window
may be regarded as an FIR lowpass filter having an extremely narrow
pass-band.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...
CPUs
^{5.1} - In recent years, Graphics Processing Units (GPUs) have
been used increasingly for high-performance general-purpose digital
signal processing [242]. For example, the NVIDIA
CUDA development environment allows harnessing of many parallel
threads of execution in a GPU (typically on a graphics card or
chip-set in a personal computer) from a C program using GPU-related
library extensions. Due to the massive parallelism available in a GPU,
FFT convolution becomes competitive with time-domain convolution only
at much longer lengths (in the thousands) [242]. In a
CPU (one processor), FFT methods typically win out for convolution
lengths greater than 100 or so (see §8.1.4 for some
details).
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... frequencies.
^{5.2} - In this book, unless
specified otherwise, all frequencies are
*normalized*by the sampling rate. Thus, is physically ``cycles per sample.''. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...optfirlp.
^{5.3} - In fact,
optimal window design is a special case of optimal FIR lowpass design
in which the desired pass-band width is nearly zero, as we will see.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...
audible.
^{5.4} - A 0.1 dB pass-band ripple results in a normally
inaudible amplitude error, while a -60 dB stop-band ripple can be
quite audible if the pass-band is relatively quiet and high signal
energy appears in the stop-band.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... frequency)
^{5.5} - It is a
common matlab convention to specify normalized frequencies between
0 and 1 such that 1 corresponds to
*half*the sampling rate, instead of the sampling rate. Thus, one-fourth the sampling rate ( ) is specified as in matlab. In this book, outside of Matlab function arguments, normalized frequency is always the sampling rate.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ....
^{5.6} - As a result of the implicit equal weighting, the
stop-band ripple at
dB means that the pass-band ripple
is below 0.007 dB, which is overkill for typical audio applications.
This pass-band ripple can be enlarged in exchange for improved
stop-band ripple or a narrower transition band or both. This is done
by specifying a larger weighting on the stop-band ripple--see the
help page for
`firpm`for details.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... design
^{5.7} - Recall that window
design can be formulated as FIR lowpass-filter design with a
zero-with (or nearly zero-width) pass-band.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... ``tight''.
^{5.8} - Our choice came from
as used in [228].
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...RabinerAndGold.
^{5.9} - Show this as an exercise. [Hint: Recall
the
*stretch theorem*(§2.3.11).]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... response.
^{5.10} - It is effective in
practice to try doubling the FFT size to see if it appreciably
changes the designed filter--it should not.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... consequences.
^{5.11} - However, consider an
FM signal in which a sinusoid is sweeping back and forth in the
pass-band. In that case, the pass-band-ripple generates AM
sidebands, so a spec more like that in the stop-band may be called
for. Here, we allow the pass-band ripple to be 10 times the
stop-band ripple, which is a reasonable compromise.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... gains.
^{5.12} - While cubic splines are maximally
smooth in a precise physical sense, they are not band-limited, so
one can do better by using band-limited interpolation of the
desired frequency-response points. (In this situation,
``band-limited'' equals ``time-limited,'' which is exactly what we
are going for in an FIR filter design.)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...
Octave).
^{5.13} -
At the time of this writing,
`remez`is a C-compiled extension (`.oct`file) in the`octave-signal`package for Linux (Red Hat Fedora 16), but not in the package of the same name under MacPorts for Mac OS X. To create a new C-compiled extension, the original Fortran listing from [66,224],*e.g.*,`http://ccrma.stanford.edu/~jos/sasp/remez.f4`can be converted to C using`f2c`from`http://www.netlib.org/f2c/`.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ....
^{5.14} - Equivalently, (4.38)
can be minimized by one step of
*Newton's method*.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... (Hz).
^{6.1} - Long ago, the term for Hz was
*cycles per second*(cps).. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ....
^{6.2} - More
generally, an analytic signal is obtained from a real signal by
filtering out its negative-frequency components. In other terms, the
imaginary part of the analytic signal may be obtained as the
*Hilbert transform*of the real part (see §4.6).. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...
property
^{6.3} - The
*sifting property*of delta functions provides that(6.5)

for every continuous function . We think of a delta function as having zero width and unit area (§B.10).. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... frequency.
^{6.4} - This is
not as confusing as one might think at first. When the frequency
range is
to
, normalized radian frequency is being used
(radians per sample). When the range is
to
, it is
normalized frequency (cycles per sample). The unnormalized case (true
physical radian frequency in radians per second) usually only arises
in applications.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...
moment.
^{6.5} - See §B.17 for an example of this regarding the
*uncertainty principle*.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ....
^{6.6} - A standard notation for
fundamental frequency is ``
'' (or
). This comes from the
speech analysis community, where usually
,
, and so on,
refer to the
*formant frequencies*(resonance peak frequencies) of the vocal tract. When not working with formants, it is convenient to define the fundamental frequency as , so that the frequency of the th harmonic is .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... periodic.
^{6.7} - Most plucked strings can be
considered very nearly harmonic. Piano strings, however, are
significantly
*stiff*so that they exhibit audible inharmonicity--the partial overtone series is*stretched*[77,267].. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...
exactly.
^{6.8} - One situation in which minimum orthogonal spacing
works well is when the signal is known to be exactly periodic, and
the period is accurately measured using a fundamental-frequency
estimator (§10.1). In this case, we can
*resample*the periodic signal to obtain an exact integer number of samples per period, and a rectangular window can be set to exactly one period in length. In this situation, each DFT coefficient is proportional to a*Fourier series*coefficient (defined in Chapter 2), and the peak frequencies are known to be integer multiples of the fundamental frequency, so no peak interpolation is needed at all. In other words, the fundamental frequency estimator takes care of locating all the peaks in frequency: Resampling to an integer period in the time domain corresponds to resampling the spectrum at each main-lobe peak in the frequency domain.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... bins
^{6.9} - Here
we mean
*fractional*bins when is not an integer.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...
applications.
^{6.10} - A tuning error of
% is about ``two
cents'', where a cent is defined as a hundredth of a semitone, or
. Most people cannot detect tuning
errors of only two cents, unless some kind of interference effect is
involved, in which case the frequency error translates to a slowly
modulated amplitude envelope.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...
line.
^{6.11} - We encountered least-squares optimization for FIR
filter design in §4.10.3.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... distributed
^{6.12} - The
Gaussian distribution is also called the
*normal distribution*, or ``bell curve.'' By the ``central limit theorem'' (§D.9.1), any sum of independent random variables becomes Gaussian in the limit. Therefore, filtered noise is usually well modeled as Gaussian, since the filtering typically adds many random variables together. See Appendix D for more about the Gaussian distribution.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...
by
^{6.13} - The real and imaginary parts of
are
independently distributed according to the more familiar Gaussian
density function
(6.45)

where .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... autocorrelation,
^{7.1} - Note that there are many
possible
*biased*estimates of the true autocorrelation function. However, we will consider only one of them in this book.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... ``complicated''.
^{7.2} - An interesting discussion
of the meaning of randomness is given in Knuth
[131, vol. II].
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... function.
^{7.3} - In Octave,
it is necessary to install the add-on package
`octave-forge`to obtain this and other signal processing functions.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ....
^{7.4} - Note that we
are assuming
is zero mean. Otherwise, the sample variance would be
defined with the mean subtracted out, as discussed further in
§C.1.10. When the mean is zero, a correlation may be called a
*covariance*.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... mean.
^{7.5} - Note that we are not
*averaging*the PSD to get the total variance, but instead*summing it*. This is why the factor in the IDFT above is associated with and why the IDFT is also written as an integral with respect to a Hertz frequency axis.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... data.
^{7.6} - A
*trend*is typically estimated using linear regression. That is, a straight line is fit through the data in a least squares sense. (See the function`polyfit`in Matlab or Octave.). . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...Kay88:
^{7.7} -
The division by
can be seen as part of the implicit
autocorrelation calculation. It normalizes the peak of the
*implicit Bartlett window*on the autocorrelation to 1, as discussed further below. Alternatively, it may be interpreted as a normalization of the Fourier transform itself, converting a power spectrum (squared-magnitude FFT) to a*normalized*power spectrum (NDFT). Normalization is needed for stationary random processes since they generally have infinite signal energy but finite average power;*i.e.*, grows without bound as increases.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... correlated.
^{7.8} - An exception is when white noise is
filtered using an allpass filter, in which case the output signal is
still white noise.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... noise|textbf.
^{7.9} - For a more
formal development, see the
*Wold decomposition theorem*.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... noise
^{7.10} - The term ``pink noise'' indicates that the
spectrum is more intense at low frequencies than at high frequencies.
This makes sense since the color pink is heavier in the red end of the
spectrum compared with white which balances all colors equally.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...VossAndClarke78,
^{7.11} - Physical phenomena exhibiting a
power spectral density law include noise in vacuum tubes, carbon resistors,
transistor junctions, metal films, ionic solutions, films at the
superconducting transition, Josephson junctions, nerve membranes,
cosmic background radiation distribution, sunspot activity, and the
flood levels of the river Nile
[294]. In addition, the sub-audio short-time power
fluctuations in music (
*i.e.*, below the audible frequencies starting at 20 Hz) have been shown to follow the characteristic, especially classical music [294].. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... correctly.
^{8.1} - In the Matlab
Signal Processing Toolbox, the argument
`'periodic'`should be included when creating the window.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...
integer.
^{8.2} - Actually, non-integer
can be accommodated by
rotating among a set of windows obtained by sampling the underlying
continuous window at different phases.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... limited.
^{8.3} - This is of course the Fourier dual
of saying that the uniform sampling of a time-domain signal is
information-preserving provided the signal is properly bandlimited
(in the frequency domain).
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... magnitude.
^{8.4} - The spectrogram is often called a
*sonogram*when applied to audio signals.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... STFT.
^{8.5} - Perfect reconstruction
is also possible in principle using
as large as
with the
Hamming window. However, this requires dividing out the amplitude
modulation given by the sum of Hamming windows displaced by
(see
Eq.(7.2)). In practice,
(50% overlap) is the largest hop size
used with the Hamming window because it is the largest value that
preserves the
*constant-overlap-add*(COLA) property. We will learn in Chapter 9 that (75% overlap) is significantly more robust than 50% overlap, and is recommended when*spectral modifications*are to be carried out on the STFT data.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... cochlea.
^{8.6} - See
`http://www.blackwellscience.com/matthews/ear.html`for an animated tutorial.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... SPL.
^{8.7} - A listening-level slider would be nice to have in
the Graphical User Interface (GUI) for a loudness spectrogram.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...
bank.
^{8.8} - Note that the FFTs are effectively downsampled by this
operation, with the highest ``frequency-domain sampling rate''
occurring at the lowest frequency of the band. Therefore, the FFT
length can be set by matching the adjacent auditory filter spacing to
the low-frequency bin spacing of the FFT at the lower edge of the
frequency range covered by that FFT). In fact, one very large FFT
could be used in which the low-frequency bin spacing is approximately
equal to the spacing of the center-frequencies of the auditory
filter-bank channels at the low-frequency extreme.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...
^{8.9} - Since Patty's work, which is not available in source-code form, a free Matlab ``loudness toolbox''
`http://www.genesis-acoustics.com/index.php?page=32`has appeared that looks good.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... level.
^{8.10} - Downloading
`http://ccrma.stanford.edu/~jos/sasp/hw/SteveJobsHi.wav`and listening at a very low level (approximately 20 dB SPL) verifies that indeed this sound example sounds like ``Hi...ee-jah,'' in qualitative agreement with the sone loudness curve in Fig.7.9.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... constant.
^{8.11} *Envelope followers*in sound processing classically behave this way as well [109]. The amplitude envelope is allowed to increase instantaneously, but it floats down with some time constant that can be adjusted.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...MDFT,
^{9.1} `https://ccrma.stanford.edu/~jos/mdft/Convolution_Theorem.html`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...).
^{9.2} - These results were obtained using the
`fft`function in`Matlab`v5.2 running on a Windows PC with an 800MHz Athlon CPU.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... 2011,
^{9.3} - Octave on a Fedora 15
64-bit Linux machine (built in 2009) with an Intel Core i7-860
quad-core CPU running at 2.8 GHz
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...MDFT
^{9.4} `https://ccrma.stanford.edu/~jos/mdft/Fast_Fourier_Transform_FFT.html`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... filter.
^{9.5} - As discussed in [261, Chapter 11],
an FIR filter having impulse response
is said to be
*linear phase*when its impulse response is*symmetric*about some point in time,*e.g.*, , for , where is the length of the FIR filter.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... (COLA)
^{9.6} - The acronym
COLA is not standard in signal processing, although OLA might be
recognized by many. When writing a paper, acronyms should always be
spelled out on first use, even for surely recognized acronyms such as
``FFT''.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... filter|textbf.
^{10.1} - Low-power
applications such as RFID chips use
*cascaded integrated comb filters*(CIC filters) in their sampling-rate converters.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... filter|textbf.
^{10.2} - In ordinary sampling theory
[270],
each sample of a time-domain signal determines the scaling and
location of a
*sinc function*for all time in the underlying continuous-time signal represented by the samples. The dc sampling filter described here is the Fourier dual of the time-domain sinc function corresponding to a single sample in time.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...
demodulation|textbf.
^{10.3} - We use the term ``demodulation'' when frequencies
are translated from high to low (
to 0 in this case).
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...).
^{10.4} - We
also implicitly assumed that the DFT size
was not smaller than the
window length
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... modifications.
^{10.5} - The term
*FBS modifications*refers to changing the gain and/or phase of the time-domain signal coming out of a filter-bank channel. This is distinct from*OLA modifications*in which a spectrum is altered, inverse transformed, and overlap-added into an output buffer. Multiplicative OLA modifications are exact (no aliasing) when the zero-padding in the time domain is sufficient. FBS modifications are not provided zero-padding in the time domain, and for there is aliasing in the channel signals.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...
algorithm
^{11.1} - This algorithm was developed by the author circa
1996 for vibrating-string fundamental frequency measurement. It has
been found to work quite well on the middle third (or so) of plucked
string recordings. An extension to
*stiff strings*(stretched partial overtone frequencies) is described in [266].. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...MuellerEtAl11,KlapuriMohonk05,KlapuriSAP03,Klapuri01.
^{11.2} -
Klapuri's publications home page as of this writing:
`http://www.elec.qmul.ac.uk/people/anssik/publications.htm`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... retrieval|textbf
^{11.3} - For an introduction
to MIR, see,
*e.g.*, recent proceedings of the International Conference on Music Information Retrieval (ISMIR) or Music Information Retrieval Evaluation eXchange (MIREX) conference.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...
construction,
^{11.4} - A matrix is said to be
*Toeplitz*if the ]th entry can be expressed as a function of . A Toeplitz matrix is constant along all diagonals.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...
recursion.
^{11.5} `http://ccrma.stanford.edu/~jos/filters/Computing_Reflection_Coefficients.html`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... hearing:
^{11.6} - Due to
*nonlinearities*in hearing [179,306], it is not always valid to truncate the summation at the high-frequency hearing limit. For complete generally, should be extended to the highest frequency present in the signal , since inaudible frequencies can give rise to audible components at the output of a nonlinearity.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... overtone.
^{11.7} - The
term
*overtone*or*partial overtone*is generally used to mean a sinusoidal component which is not harmonically related to the fundamental frequency.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... ways.
^{11.8} - Frequency
modulation is the time-derivative of phase modulation.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...McAulayQuatieri86.
^{11.9} - An interesting avenue for future
research is the pursuit of new spectral-modeling primitives and
operators which are generally useful and compact for modeling
important aspects of sound.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...SmithSerra.
^{11.10} - Higher-order interpolations of so-called
*envelope break-points*were also developed at CCRMA in the late 1970s (*e.g.*, using cubic splines), but for tonal sounds, linear interpolation is usually sufficient, and the higher-order envelopes did not see much use, presumably due to the greater complexity of dealing with them coupled with the lack of significant benefit.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... STFT|textbf,
^{11.11} - Extension to multiresolution FFT analysis was an
important step in obtaining artifact-free analysis and resynthesis
of
*polyphonic*audio sources. Previously, sinusoidal and S+N modeling had been confined to monophonic sources, such as voice, or a single instrument, etc.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... band
^{11.12} - See
Appendix E for a definition of Bark bands (classical critical bands of
hearing).
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...Ellis02-pvoc,HejnaAndMusicus91
^{11.13} - Synchronous
Overlap-Add, Fixed Synthesis [104]--a
time-domain method reminiscent of the Eventide ``Harmonizer,''
based on an overlap-add decomposition of the time waveform, with
input windows shifted and output windows regularly spaced for a
fixed output synthesis window-rate. SOLA-FS is said to be more
computationally efficient than SOLA
[240,280].
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... bank|textbf.
^{11.14} - In Beranek's
**Acoustics**[15, pp. 333-334], the ``one-third octave-band analyzer'' is defined as the 25-band filter bank having the spectral partition points (in Hz) [20, 45, 57, 71, 90, 114, 142, 180, 228, 284, 360, 456, 568, 720, 912, 1136, 1440, 1824, 2272,

2880, 3648, 4544, 5760, 7296, 9088, 11520].. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... bank|textbf,
^{11.15} - An
``octave-band analyzer'' is defined in [15, p. 333] as
the 8-band filter bank having the spectral partition points [37.5,
75, 150, 300, 600, 1200, 2400, 4800, 10000] Hz.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... channels.
^{11.16} - Thanks to Jeurgen Herre for
mentioning this reference.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...dtftalias):
^{11.17} - A more efficient implementation can use
`reshape`and`sum`.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... general.
^{11.18} - When the FFT window is
a length
`N`rectangular window, then`alias(Hk .* X, Nk) == BandK`, as defined above, and there is no aliasing after all. More precisely, the aliased spectral samples all happen to be zeros of the window transform (which is an aliased sinc function, as defined in §3.1). These zeros depend on the window-length being`N`(no zero-padding), and on the window-type being rectangular (``no window'').. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...
bank.
^{11.19} - Do not confuse the Octave program--a free, open-source
implementation of the matlab language--with the
*musical*octave: a frequency interval spanning a factor of two.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... domain.
^{11.20} - The samples are
connected by straight lines to make them visible. The true responses
for the left two bands are aliased sinc functions (asinc). The next
octave up is a sum of two asincs, and the rightmost band (top
octave) is a sum of four asincs. A properly interpolated frequency
response for this filter bank is shown in
Fig.10.33.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...dcells
^{11.21} -
`http://ccrma.stanford.edu/~jos/sasp/dcells.m`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...dcells2spec
^{11.22} -
`http://ccrma.stanford.edu/~jos/sasp/dcells2spec.m`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... wide.
^{11.23} - We do not really have to restrict
consideration to powers of two, as there are many fast Fourier
transform algorithms for various composite and prime lengths
[80,289].
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... samples''
^{11.24} - Spectral samples are
defined here as ``bin numbers plus 1'', that is, spectral samples
are numbered from 1 as in matlab, rather than from 0, as in the
signal processing literature.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...
website.
^{11.25} `http://ccrma.stanford.edu/~jos/pdf/SMS.pdf`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ....
^{12.1} - We will relax the restriction
in the next
section. The only reason for the restriction now is to avoid a
different system definition for
relative to what we'll derive
in the next section.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ....
^{12.2} - The diagram-manipulation derivation in
§11.1.3 would produce
subphase branches,
each scaled by
, while here we have only
subphase
branches, each feeding a length
FIR filter.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...
transients.
^{12.3} - The Dolby AC-3 perceptual audio coding format,
which is formulated more directly as a transform coder (quantized
STFT), switches to a shorter FFT window when transients are detected
in the signal being encoded. The original Dolby AC-2 format used
length 512 FFT windows in a Princen-Bradley time-domain aliasing
cancellation scheme (sampling rate typically 44.1 kHz). The shorter
length for transients in AC-3 was chosen to be 256 samples, or half
the steady-state length [149, §4.1.4]. A special hybrid
window is needed for a smooth transition from steady-state to
transient processing, or vice versa.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... level.
^{12.4} - One careful study found that 96-kbps AAC is
roughly equivalent to 128-kbps MP3, which is a 33% lower bitrate at
roughly the same quality level. [149, §4.1.8].
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... transform|textbf.
^{12.5} - The
of a
bandpass filter may be defined as its center-frequency divided by its
bandwidth [263].
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...
bank|textbf,
^{12.6} - The term ``dyad'' simply means ``two'', as
in
*monad, dyad, triad, tetrad,*. Thus, successive bands in a dyadic filter bank are obtained using a frequency-scale factor of two.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... transforms:
^{B.1} - See
§8.3.1 for the discrete-time case.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...MDFT,
^{B.2} -
`http://ccrma.stanford.edu/~jos/mdft/Cauchy_Schwarz_Inequality.html`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... filter.
^{B.3} - An allpass filter has unity gain and
arbitrary delay at each frequency.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...
Therefore,
^{B.4} - Technically, the Fourier transform of the unit
step function
does not exist, since
is not
integrable for any value of
. However, its
*Laplace transform*does exist in the right-half plane, and the limit as is well behaved and can be taken as the definition of the Fourier transform. The same construction works for , , and so on.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...
^{B.5} - We will say that a function
is of order
if
there exists
and some positive constant
such
that
for all
.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ....
^{B.6} - Such a decomposition may be
constructed by differentiating to obtain
and defining
(B.81)

and similarly for . (The derivatives may include impulses corresponding to discontinuities in .)The quantity is called the

*total variation*of on ; if this value is finite, then is said to be of*bounded variation*on .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...impulse.
^{C.1} - The impulse
may be defined as any ``function''
for which
(C.10)

where is assumed continuous at . A typical definition is(C.11)

The impulse was introduced in Chapter 2 starting at §B.10. See also [264,36,150] for further development.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... process.
^{C.2} - The general class of stochastic processes for
which time averages equal ensemble averages is called
*ergodic*stochastic processes [201]. In this book, all stochastic processes are assumed ergodic (and hence stationary) because they can all be modeled as filtered white noise, where the filter is stable, linear, and (at least approximately over short durations) time-invariant. Furthermore, the driving noise can be chosen to be Gaussian; see more advanced texts on stochastic processes regarding distinctions that can arise in non-ergodic cases.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... length.
^{C.3} - Note
that 20 ms contains only one period of a sinusoid at 50 Hz, which is
above lower limit of pitch perception (the low note of the piano, A0,
is tuned to 22 Hz). It is therefore possible to encounter difficulty
resolving tones in the deep bass region of the audio spectrum. A 20
ms frame length works quite well, however, for telephone speech
processing, in which the nominal bandwidth is 200-3200 Hz; in this
case, a 20ms frame has at least four periods of the lowest frequency
present, and harmonic resolution is assured under the Hamming window.
In wideband audio work, a multiresolution analysis is often highly
preferable.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... numbers.
^{C.4} -
Two random events
and
are said to be
*independent*if the probability of event and occurring together equals the*product*of the probability of event times the probability of event . Similarly, two random variables and are said to be*independent*if the probability that both and equals the probability that times the probability that , where and are any values that the respective random variables can assume. For purposes of this book, it is sufficient to have only an intuitive understanding of terms such as these from probability theory. Only*sample correlations*will be needed for noise spectrum analysis.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...diffthm).
^{D.1} - This approach to the proof was discovered on the Web,
for the real case, at
`http:``//www.ph.tn.tudelft.nl/~lucas/education/tn254/2002/Fourier%20transform%20of%20a%20Gaussian.pdf`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... theorem
^{D.2} `http://mathworld.wolfram.com/CentralLimitTheorem.html`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...
increases.
^{D.3} - This can be shown as a consequence of the central
limit theorem. Coincidentally, the Gaussian function is believed to
have first appeared for approximating large binomial coefficients in
the 2nd edition (1718) of
*Doctrine of Chances*by Abraham de Moivre (a book on probability).. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...,
^{D.4} -
`http://mathworld.wolfram.com/NormalDistribution.html`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... Barks
^{E.1} - The
normalized warped-frequency interval
was converted to
Barks
by the affine transformation
,
where
is the number of Bark bands in use. For example,
for
a
kHz sampling rate.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...JOST:
^{E.2} - Matlab functions
*bark2lin.m*and*lin2bark.m*for transforming between linear and bark-warped frequency representations are available on the internet at*http://ccrma.stanford.edu/~jos/bbt/bbt.html*.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...Darrigol07.
^{G.1} - For an
online biography of Daniel Bernoulli, see also
`http://www-groups.dcs.st-and.ac.uk/~history/Biographies/Bernoulli_Daniel.html`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...
^{G.2} - This was before Fourier theory (1822), and before
differential equations as we now know them. Calculus using
trigonometric functions was still a new topic at the time.
D'Alembert derived his results from Newton's second law applied to a
differential string element, integrating the result, and imposing
rigid boundary constraints.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...
loudspeakers).
^{G.3} - Of course, we know now that the sound of a
vibrating string is produced mainly by the bridge excitation, which
is driven by a superposition of the vibrating string modes, as
envisioned by Bernoulli, or, in an equally valid alternative
formulation, by the traveling waves of D'Alembert.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... masks.
^{G.4} - See also the
fascinating history of ``optical synthesis'' at
`http://www.umatic.nl/tonewheels_historical.html`, including the photos of Arseny Avraamov's hand-drawn motion picture soundtracks (1930) and Oskar Fischinger's ``sound scrolls'' (1932). Also, the Theremin Center has photos of Boris Yankovsky's ``vibroexponator'' soundtracks using syntones and spectral mutations (`http://www.theremin.ru/archive/vibroexponator.htm`).. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... Hz)
^{G.5} - According to the voder patent (US Patent
2,121,142), the ten bandpass edges were, in Hz,
0, 225, 450, 700, 1000, 1400, 2000, 2700, 3800, 5400, and 7500.
The voder patent refers to a vocoder patent application, serial number
47,393, filed October 30, 1935, but it appears that the vocoder patent
may have never issued. In any case, Figure 1 of the voder patent is
described as ``similar'' to that of the vocoder, and having ``been
used as the natural development from the analyzer-synthesizer circuit
referred to in my previous application, Serial No. 47,393'' and the
improved version (Figure 6 of the voder patent) specified the same ten
bandlimits.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... technicians.
^{G.6} - For a
photo, see
`http://davidszondy.com/future/robot/voder.htm`. See also the Smithsonian Speech Synthesis History Project:

`http://americanhistory.si.edu/archives/speechsynthesis/ss_btl1.htm`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... domain.
^{G.7} - Dolson and Laroche have extended
this idea to the processing of
*nonparametric spectral peaks*in the short-time spectrum [143,142,139].. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...
^{G.8} - In practice, one normally cross-fades among tables
designed for different pitch ranges, as described,
*e.g.*, at`http://swiki.hfbk-hamburg.de/MusicTechnology/878`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...
^{G.9} - In practical additive synthesis, each harmonic overtone, or
harmonic overtone
*group*, may be synthesized using a wavetable oscillator (§G.8.4) [165,27,193]. When the number of overtones is large, the inverse-FFT method is typically used [239].. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...
signals.
^{G.10} - An early SAIL program
`reduce.sai`for adaptively allocating piecewise-linear breakpoints is available online at`http://ccrma.stanford.edu/~jos/sasp/reduce.sai`. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...
convolution).
^{G.11} - A
*model*typically means*parametric model*. In contrast, a*nonparametric representation*, such as an FFT, is regarded as*data*or*processed data*--not a model. The number of data points (``degrees of freedom'') in a nonparametric model is typically on the same order as that of the original data. True*model parameters*, on the other hand, will have far fewer degrees of freedom than the original data.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...
1977,
^{G.12} - This is a personal observation based on using the
software written by James A. Moorer running on the PDP KL-10 soon
after I arrived at CCRMA. (At that time, ``CCRMA'' was an unnamed
computer-music project at the Stanford Artificial Intelligence
Lab). From reading the code, it appeared to be an implementation along
the lines of Portnoff's 1976 paper. As far as I know, prior
additive-synthesis analysis was performed by
the
*heterodyne-comb*technique [185], which is related to Goetzel's algorithm for computing a single bin of the DFT recursively.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... transient.
^{G.13} - The noise component
cannot be used, in general, because no matter how much resolution is
provided in the amplitude envelope of the noise, there is usually no
guarantee that the noise, being random, will have the desired
amplitude at the critical time it is needed.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...PARSHL
^{H.1} - PARSHL was so named because it could follow
*partials*(as opposed to merely harmonics). Being written for the PDP-10 computer running the SAIL operating system, the filename was restricted to 6 characters, so that ``partial'' became ``PARSHL''.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ... phase
^{H.2} - The
version written in 1985 did not support phase. Phase support was
added much later by the second author of [271] in the
context of his Ph.D. research, using the phase interpolation algorithm
of McAulay and Quatieri [174].
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

- ...
forever,
^{H.3} - We tried reusing turned-off oscillators but found
them to be more trouble than they were worth in our environment.
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .