- ... (Hz).2.1
- Long ago, the term for Hz was cycles per
second (cps).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ....2.2
- More
generally, an analytic signal is obtained from a real signal by
filtering out its negative-frequency components. In other terms, the
imaginary part of the analytic signal may be obtained as the
Hilbert transform of the real part (see
§B.5).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
property2.3
- The sifting property of delta functions
provides that
for every continuous function
. We think of a delta function
as having zero width and unit area [243].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... frequency.2.4
- This is
not as confusing as one might think at first. When the frequency
range is
to
, normalized radian frequency is being used
(radians per sample). When the range is
to
, it is
normalized frequency (cycles per sample). The unnormalized case (true
physical radian frequency in radians per second) usually only arises
in applications.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
function|textbf:2.5
- Note that writing
to denote the aliased
sinc function is not standard practice in signal processing--consider
it proposed notation.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ....2.6
- In more detail, start with the ``physical'' definition of
:
Now replace
by
in the numerator. Take the limit as
goes to zero, with
remaining fixed, so that
goes to infinity
(maintaining the relation
). When
gets very small, the denominator becomes
which completes the proof.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... crossings.2.7
is the
radian-frequency sampling interval for a length
DFT. Using
to denote the sampling interval along
is analogous
to using
to denote the sampling interval along time
-- hence
the choice of symbol
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
moment.2.8
- See §2.4.16 for an example of this regarding the
uncertainty principle.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ....2.9
- A standard notation for
fundamental frequency is ``
'' (or
). This comes from the
speech analysis community, where usually
,
, and so on,
refer to the formant frequencies (resonance peak frequencies)
of the vocal tract. When not working with formants, it is convenient
to define the fundamental frequency as
, so that the frequency of
the
th harmonic is
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... periodic.2.10
- Most plucked strings can be
considered very nearly harmonic. Piano strings, however, are
significantly
stiff so that they exhibit audible inharmonicity--the partial
overtone series is stretched [68,244].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
exactly.2.11
- One situation in which minimum orthogonal spacing
works well is when the signal is known to be exactly periodic, and the
period is accurately measured using a fundamental-frequency estimator
(§10.6). In this case, we can resample the periodic
signal to obtain an exact integer number of samples per period, and a
rectangular window can be set to exactly one period in length. In
this situation, each DFT coefficient is proportional to a
Fourier series coefficient (defined in Chapter 2), and
the peak frequencies are known to be integer multiples of the
fundamental frequency, so no peak interpolation is needed at all. In
other words, the fundamental frequency estimator takes care of
locating all the peaks in frequency, and the resampling leads to
spectral samples at each main-lobe peak.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... variable,3.1
- Most of this chapter uses
normalized frequency, i.e., the sampling rate equals
sample per second.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...MDFT.3.2
- http://ccrma.stanford.edu/~jos/mdft/Fourier_Theorems.html
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... states3.3
- Our
notational convention is that the first subscripts of an operator such
as
are its parameters, as in
, and the last
subscript selects a particular sample of output, as in
. If the last subscript is omitted, it ``returns''
an entire signal. Thus,
is a scalar while
is an entire signal defined over the integers. We
also may use
to denote all values of some index, e.g.,
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... cases.3.4
- For
definitions of the DFT, DTFT, FT, and FS, see Chapter 2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... transforms:3.5
- See
§8.3.1 for the discrete-time case.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... principle|textbf.3.6
- The Heisenberg uncertainty principle in quantum physics
applies to any dual properties of a particle. For example, the
position and velocity of an electron are oft-cited as such duals. An
electron is described, in quantum mechanics, by a probability wave
packet. Therefore, the position of an electron in space can be
defined as the midpoint of the amplitude envelope of its wave
function; its velocity, on the other hand, is determined by the
frequency of the wave packet. To accurately measure the
frequency, the packet must be very long in space, to provide many
cycles of oscillation under the envelope. But this means the location
in space is relatively uncertain. In more precise mathematical terms,
the probability wave function for velocity is proportional to the
spatial Fourier transform of the probability wave for position. I.e.,
they are exact Fourier duals. The Heisenberg Uncertainty Principle is
therefore a Fourier property of fundamental particles described by
waves [53].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...MDFT,3.7
- http://ccrma.stanford.edu/~jos/mdft/Cauchy_Schwarz_Inequality.html
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... filter.3.8
- An allpass filter has unity gain and
arbitrary delay at each frequency.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
Therefore,3.9
- Technically, the Fourier transform of the unit
step function
does not exist, since
is not
integrable for any value of
. However, its Laplace
transform
does exist in the right-half
plane, and
the limit as
is well behaved and can be taken as the
definition of the Fourier transform. The same construction works for
,
, and so on.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... order3.10
- We will say that a function
is of order
if
there exists
and some positive constant
such
that
for all
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ....3.11
- Such a decomposition may be
constructed by differentiating to obtain
and defining
and similarly for
. (The derivatives may include impulses
corresponding to discontinuities in
.)
The quantity
is called the
total variation of
on
; if this value is finite,
then
is said to be of
bounded variation on
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...Harris78.4.1
- The Hamming window can also be derived as a
special case of windows having a maximized main-lobe peak
over
all windows of the same energy and prescribed first zero-crossings
about the main lobe
[186, p. 239,403].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...Harris78.4.2
- Note that the
dB figure is
for large window-length
. For small window lengths, the side-lobe
levels increase. This phenomenon can be understood in terms of
aliasing of the side-lobes of the continuous Hamming window which must
be sampled to obtain a discrete-time window.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... Hann.4.3
- The
precise side-lobe level is dependent on window length
, but
to
dB is typical for the Hamming window.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ....4.4
- From a
linear algebra point of view, consider the sinc kernel as
corresponding to a Toeplitz Hermitian matrix. It is well known
that Hermitian matrices have real eigenvalues and orthogonal
eigenvectors. Also, multiplication by a Toeplitz matrix corresponds
to convolution (in this case, a non-causal convolution.)
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
Box4.5
- For Octave, the original version by Eric Breitenberger is
still available on the Web, as of this writing, at
http://pangea.stanford.edu/Oceans/GES290/Breitenberger-SSAMatlab/mtm/.
Note, however, that the calling arguments after the first two are
differently defined. A simple version written by the author appears
in §F.1.2.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... kind:4.6
- The Maclaurin series for
can be obtained as
the term-by-term square of that for
, since
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... radians-per-second).4.7
- In
[88],
is described as half of
the time-bandwidth product, which in turn is not defined. Factors of
2 often come and go because, e.g., the frequency band
is often considered a bandwith of
(neglecting negative frequencies).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ....4.8
- The causal version may be
computed as the inverse DFT of
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... itself:4.9
- The not-so-smooth function
also transforms to itself
[133, p. 47]. Also, a periodic impulse train transforms
to an impulse train with reciprocal spacing of the impulses.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
interpolation.5.1
- Conversely, the DTFT of a time-limited signal
can be sampled to convert it to a DFT with no loss of
information. If the sampling density is not sufficiently high, there
will be aliasing (wrap-around) in the time domain. This is an
exact Fourier dual of sampling in the time domain
[247].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
practice.5.2
- Nevertheless, a perceptually exact
band-limited interpolation can be implemented at a reasonable cost in
the time domain, provided some amount of
oversampling is used. Oversampling in the time domain provides
a guard band in the frequency domain, which enables the
interpolation kernel to meet perceptually ideal specifications at a
much smaller length [247].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...spectsamps.5.3
- In particular, zero-padding does not increase the resolution
of an FFT. This is a surprisingly common point of misunderstanding
(or sometimes just mislabeling).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... work.5.4
- One could say the
Blackman window is well matched to ``analog synthesizer quality''
levels, where a 60 dB signal-to-noise ratio is common.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... bins5.5
- Here
we mean fractional bins when
is not an integer.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
applications.5.6
- A tuning error of
% is about ``two
cents'', where a cent is defined as a hundredth of a semitone, or
. Most people cannot detect tuning
errors of only two cents, unless some kind of interference effect is
involved, in which case the frequency error translates to a slowly
modulated amplitude envelope.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... distributed|textbf5.7
- The
Gaussian distribution is also called the normal distribution,
or ``bell curve.'' By the ``central limit theorem,'' any sum of
independent random variables becomes Gaussian in the limit.
Therefore, filtered noise is usually well modeled as Gaussian, since
the filtering typically adds many random variables together. See
Appendix C for more about the Gaussian distribution.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... by5.8
- The real and
imaginary parts of
are independently distributed
according to the more familiar Gaussian density function
where
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... autocorrelation,6.1
- Note that there are many
possible biased estimates of the true autocorrelation
function. However, we will consider only one of them in this book.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... ``complicated''.6.2
- An interesting discussion
of the meaning of randomness is given in Knuth
[115, vol. II].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... function.6.3
- In Octave,
it is necessary to install the add-on package octave-forge to
obtain this and other signal processing functions.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ....6.4
- Note that we
are assuming
is zero mean. Otherwise, the sample variance would be
defined with the mean subtracted out, as discussed further in
§D.1.10. When the mean is zero, a correlation may be called a
covariance.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... data.6.5
- A trend is typically estimated using linear
regression. That is, a straight line is fit through the data in a
least squares sense. (See the function polyfit in Matlab or
Octave.)
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...Kay88:6.6
-
The division by
can be interpreted as normalizing the peak of the
implicit Bartlett window on the autocorrelation function to 1,
as discussed further below. Alternatively, it may be interpreted as a
normalization of the Fourier transform itself, converting a power
spectrum (squared-magnitude FFT) to a power spectral density. Such
normalization is necessary for stationary random processes since they
generally have infinite signal energy (but finite average power); i.e.,
grows without bound as
increases.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... correlated.6.7
- An exception is when white noise is
filtered using an allpass filter, in which case the output signal is
still white noise.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... noise|textbf.6.8
- For a more
formal development, see the Wold decomposition theorem.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... noise6.9
- The term ``pink noise'' indicates that the
spectrum is more intense at low frequencies than at high frequencies.
This makes sense since the color pink is heavier in the red end of the
spectrum compared with white which balances all colors equally.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...VossAndClarke78,6.10
- Physical phenomena exhibiting a
power spectral density law include noise in vacuum tubes, carbon resistors,
transistor junctions, metal films, ionic solutions, films at the
superconducting transition, Josephson junctions, nerve membranes,
cosmic background radiation distribution, sunspot activity, and the
flood levels of the river Nile
[270]. In addition, the short-time power
fluctuations in music (below 20Hz) have been shown to follow the
characteristic, especially classical music [270].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...Abel04.6.11
- See,
e.g.,
http://www.uaudio.com/.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... correctly.7.1
- In the Matlab
Signal Processing Tool Box, the argument 'periodic' should be
included when creating the window.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
integer.7.2
- Actually, non-integer
can be accommodated by
rotating among a set of windows obtained by sampling the underlying
continuous window at different phases.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... limited.7.3
- This is of course the Fourier dual
of saying that the uniform sampling of a time-domain signal is
information-preserving provided the signal is properly bandlimited
(in the frequency domain).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... magnitude.7.4
- The spectrogram is often called a
sonogram when applied to audio signals
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... STFT.7.5
- Perfect reconstruction
is also possible in principle using
as large as
with the
Hamming window. However, this requires dividing out the amplitude
modulation given by the sum of Hamming windows displaced by
(see
Eq.
(6.2)). In practice,
(50% overlap) is the largest hop size
used with the Hamming window because it is the largest value that
preserves the constant-overlap-add (COLA) property. We will
learn in Chapter 9 that
(75% overlap) is significantly
more robust than 50% overlap, and is recommended when spectral
modifications are to be carried out on the STFT data.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... cochlea.7.6
- See
http://www.blackwellscience.com/matthews/ear.html
for an animated
tutorial.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... SPL.7.7
- A listening-level slider would be nice to have in
the Graphical User Interface (GUI) for a loudness spectrogram.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
bank.7.8
- Note that the FFTs are effectively downsampled by this
operation, with the highest ``frequency-domain sampling rate''
occurring at the lowest frequency of the band. Therefore, the FFT
length can be set by matching the adjacent auditory filter spacing to
the low-frequency bin spacing of the FFT at the lower edge of the
frequency range covered by that FFT). In fact, one very large FFT
could be used in which the low-frequency bin spacing is approximately
equal to the spacing of the center-frequencies of the auditory
filter-bank channels at the low-frequency extreme.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... level.7.9
- Downloading http://ccrma.stanford.edu/~jos/sasp/hw/SteveJobsHi.wav
and listening at a very low level (approximately 20 dB SPL) verifies that
indeed this sound example sounds like ``Hi...ee-jah,'' in
qualitative agreement with the sone loudness curve in Fig.6.8.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... constant.7.10
- Envelope
followers in sound processing classically
behave this way as well [95]. The amplitude envelope is
allowed to increase instantaneously, but it floats down with some time
constant that can be adjusted.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... hearing:8.1
- Due to nonlinearities in hearing
[163,278], it is not always valid to truncate
the summation at the high-frequency hearing limit. For complete
generally,
should be extended to the highest frequency
present in the signal
, since inaudible frequencies can give
rise to audible components at the output of a nonlinearity.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... overtone.8.2
- The
term
overtone or partial overtone is generally used to mean a
sinusoidal component which is not harmonically related to the
fundamental frequency.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... domain.8.3
- Dolson and Laroche have extended
this idea to the processing of nonparametric spectral peaks in
the short-time spectrum
[126,125,123].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... filter.9.1
- As discussed in [240, Chapter 11],
an FIR filter having impulse response
is said to be linear
phase when its impulse response is symmetric about some point in time, e.g.,
, for
, where
is the length of the FIR filter.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... (COLA)9.2
- The acronym
COLA is not standard in signal processing, although OLA might be
recognized by many. When writing a paper, acronyms should always be
spelled out on first use, even for surely recognized acronyms such as
``FFT''.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... filter|textbf.10.1
- In ordinary sampling theory
[247],
each sample of a time-domain signal determines the scaling and
location of a sinc function for all time in the underlying
continuous-time signal represented by the samples. The dc sampling
filter described here is the Fourier dual of the time-domain sinc
function corresponding to a single sample in time.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
demodulation|textbf.10.2
- We use the term ``demodulation'' when frequencies
are translated from high to low (
to 0 in this case).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...).10.3
- We
also implicitly assumed that the DFT size
was not smaller than the
window length
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... modifications.10.4
- The term
FBS modifications refers to changing the gain and/or phase
of the time-domain signal coming out of a filter-bank channel. This
is distinct from OLA modifications in which a spectrum is
altered, inverse transformed, and overlap-added into an output buffer.
Multiplicative OLA modifications are exact (no aliasing) when the
zero-padding in the time domain is sufficient. FBS modifications are
not provided zero-padding in the time domain, and for
there is
aliasing in the channel signals.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... spectrum.10.5
- For some background info, see
http://www.geofex.com/Article_Folders/wahpedl/wahped.htm
and
http://www.geofex.com/Article_Folders/wahpedl/voicewah.htm.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... speech.10.6
- Search for ``Fant vowel diagram'' on the
Web, or see the vowel diagram at the second URL in the previous footnote.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...11.1
- The original vocoder used a ``buzz source'' driving the
filter bank during voiced speech, and a ``hiss source'' driving the
filter bank during unvoiced speech. In speech modeling by
linear-prediction [147], the buzz source is classically an impulse
train, and the hiss source is white noise. In additive synthesis
(§7.1.4), each harmonic overtone, or overtone group, is
synthesized using some form of wavetable oscillator. The additive synthesis
approach generalizes readily to inharmonic spectra.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...PARSHL11.2
- PARSHL was so named because it could follow
partials (as opposed to merely harmonics). Being written for
the PDP-10 computer running the SAIL operating system, the filename
was restricted to 6 characters, so that ``partial'' became
``PARSHL''.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... phase11.3
- The
version written in 1985 did not support phase. Phase support was
added much later by the second author of [248] in the
context of his Ph.D. research, using the phase interpolation algorithm
of McAulay and Quatieri [158].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
forever,11.4
- We tried reusing turned-off oscillators but found
them to be more trouble than they were worth in our environment.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... band11.5
- See Appendix E for a
definition of Bark bands (classical critical bands of hearing).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...KlapuriMohonk05,KlapuriSAP03,Klapuri01.11.6
- Klapuri's publication home page: http://www.cs.tut.fi/~klap/iiro/
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
transients.12.1
- The Dolby AC-3 perceptual audio coding format,
which is formulated more directly as a transform coder (quantized
STFT), switches to a shorter FFT window when transients are detected
in the signal being encoded. The original Dolby AC-2 format used
length 512 FFT windows in a Princen-Bradley time-domain aliasing
cancellation scheme (sampling rate typically 44.1 kHz). The shorter
length for transients in AC-3 was chosen to be 256 samples, or half
the steady-state length [132, §4.1.4]. A special hybrid
window is needed for a smooth transition from steady-state to
transient processing, or vice versa.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... level.12.2
- One careful study found that 96-kbps AAC is
roughly equivalent to 128-kbps MP3, which is a 33% lower bitrate at
roughly the same quality level. [132, §4.1.8].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... modifications12.3
- We distinguish
here between multiplicative spectral modification in overlap-add
(frequency-domain convolution) and modifications introduced as
gains applied to the filter-bank channel
signals prior to remodulation and summing to reconstruct the signal.
(The channel gains may be time-varying complex numbers.) All
overlap-add systems with sufficient zero-padding will yield perfect
reconstruction in the presents of multiplicative spectral
modifications, as discussed in Chapter 8, even when their
filter-bank interpretation obviously involves aliasing cancellation
between channels in the frequency domain. On the other hand,
filter-bank modifications, being in the time domain, do not support
the overlap-add of multiple ``temporary time axes'' as appear in the
OLA case, and perfect filter-bank reconstruction relies upon aliasing
cancellation.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... frequencies.B.1
- In this
book, unless specified otherwise, all frequencies are
normalized by the sampling rate. Thus,
is
physically ``cycles per sample.''
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... gains.B.2
- While cubic splines are
maximally smooth in a precise physical sense, they are not
band-limited, so one can do better by using band-limited interpolation
of the desired frequency-response points.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
design.B.3
- In this context,
non-parametric means a design given by the inverse DFT of the
sampled, desired frequency response. An example of a parametric
filter design method is linear predictive coding (LPC).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... aliasing.B.4
- More generally, any
non-linear function of an FIR frequency response can be expected to
correspond to an infinitely long impulse response (IIR) in the time
domain. This can be shown by expressing the nonlinear modification as
an infinite power series, and noting that each term in the power
series corresponds to an iterated convolution. See the topic of
Volterra series expansions of nonlinear systems for more on this
point.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...diffthm).
C.1
- This approach to the proof was discovered on the Web,
for the real case, at
http://www.ph.tn.tudelft.nl/~lucas/education/tn254/2002/Fourier%20transform%20of%20a%20Gaussian.pdf
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... distributionC.2
- A PDF is one type of probability distribution in which probability is distributed continuously over a range of values, as in the probability of
any given temperature on a given day. The probability of any
particular temperature (specified with infinite precision) is zero,
but ranges of temperatures have nonzero probability. In
contrast to PDFs are discrete probability distributions, in which
nonzero probability is assigned to specific numbers. An example of a
discrete distribution is the probability (
) of heads or tails in
a coin toss. The term ``distribution'' may refer to a discrete
distribution, a PDF, or a mixure of the two. See Appendix D
for a beginning introduction to statistical signal processing.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... function.D.1
- The impulse function
may be defined as any function
for which
where
is assumed continuous at
. A typical definition is
The impulse was introduced in Chapter 2 starting at §2.4.9.
See also [243,31,133] for further development.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... length.D.2
- Note
that 20 ms contains only one period of a sinusuoid at 50 Hz, which is
above lower limit of pitch perception (the low note of the piano, A0,
is tuned to 22 Hz). It is therefore possible to encounter difficulty
resolving tones in the deep bass region of the audio spectrum. A 20
ms frame length works quite well, however, for telephone speech
processing, in which the nominal bandwidth is 200-3200 Hz; in this
case, a 20ms frame has at least four periods of the lowest frequency
present, and harmonic resolution is assured under the Hamming window.
In wideband audio work, a multiresolution analysis is often highly
preferable.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... numbers.D.3
-
Two random events
and
are said to be
independent if the
probability of event
and
occurring together equals the product
of the probability of event
times the probability of event
.
Similarly, two random variables
and
are said to be
independent if the
probability that both
and
equals the probability that
times the probability that
, where
and
are any
values that the respective random variables can assume. For purposes
of this book, it is sufficient to have only an intuitive
understanding of terms such as these from probability theory. Only
sample correlations will be needed for noise spectrum analysis.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... planeE.1
- Note that the image of the conformal map
corresponds to the domain variable
of the allpass
transformation, while the input of the map corresponds to the
range variable
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
scaleE.2
- The Bark scale is reviewed in §E.5 below.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... desired.E.3
- In
general, the unit circle is mapped once to itself by any allpass
transformation for which the number of poles
minus the number of
zeros
inside the unit circle is
. Therefore, higher
order allpass transfer functions can be used having
poles inside the
unit circle, say, and
poles outside the unit circle.
However, such a transformation cannot be used for audio digital filter
design, our principle application, because it results in an unstable final
filter
. It similarly cannot be used in any
applications requiring time-domain implementation of the unstable allpass
filter in place of a unit delay element.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ... BarksE.4
- The
normalized warped-frequency interval
was converted to
Barks
by the affine transformation
,
where
is the number of Bark bands in use. For example,
for
a
kHz sampling rate.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...JOST:E.5
- Matlab functions bark2lin.m and lin2bark.m for transforming between linear and bark-warped frequency
representations are available on the internet at http://ccrma.stanford.edu/~jos/bbt/bbt.html.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
- ...
matrix.F.1
- There are at least two (free) add-on packages for
Octave implementing
specgram.m, the ``Matcompat'' and ``Octave-Forge'' packages.
However, inexplicably (and inexcusably), the Octave-Forge version
returns one FFT per row in the output matrix, while the Matlab
version and Matcompat version return one FFT per column.
Octave routines should always be written to be compatible with Matlab
syntax when possible. Developers: If you ``improve'' the API for a
pre-existing function, please pick a new name!
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.