... email1.1
jos at ccrma
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... FFT2.1
``Fast Fourier Transform'' -- fast algorithms for implementing the Discrete Fourier Transform (DFT) [264].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... (DTFT).3.1
In practical situations we can only deal with finite-duration signals, so really we always use the Discrete Fourier Transform (DFT) [264]. Moreover, the DFT is typically implemented using the split-radix Cooley-Tukey Fast Fourier Transform (FFT), which requires the DFT length to be a power of 2. Thus, in practice, we use an FFT, while for theoretical studies, the DTFT is preferred.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...MDFT.3.2
http://ccrma.stanford.edu/~jos/mdft/Fourier_Theorems_DFT.html
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... variable,3.3
The symbol `` $ \isdeftext $ '' means ``is defined as'' or ``equals by definition.'' In conformity with typical signal-processing literature, most of this chapter uses normalized frequency, i.e., the sampling rate equals $ f_s=1$ sample per second.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...MDFT.3.4
http://ccrma.stanford.edu/~jos/mdft/Fourier_Theorems.html
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... states3.5
Our notational convention is that the first subscripts of an operator such as $ \hbox{\sc Shift}$ are its parameters, as in $ \hbox{\sc Shift}_l(x)$ , and the last subscript selects a particular sample of output, as in $ \hbox{\sc Shift}_{l,n}(x)$ . If the last subscript is omitted, it ``returns'' an entire signal. Thus, $ \hbox{\sc Shift}_{l,n}(x)$ is a scalar while $ \hbox{\sc Shift}_l(x)$ is an entire signal defined over the integers. We also may use $ (\cdot)$ to denote all values of some index, e.g., $ \hbox{\sc Shift}_l(x)=\hbox{\sc Shift}_{l,(\cdot)}(x)$ .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... cases.3.6
For definitions of the DFT, DTFT, FT, and FS, see Table 2.1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...3.7
If the sampling density is not sufficiently high, there will be aliasing (wrap-around) in the time domain. DTFT sampling in the frequency domain is an exact Fourier dual of ordinary sampling in the time domain.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...MDFT:3.8
http://ccrma.stanford.edu/~jos/mdft/Zero_Padding.html
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... practice.3.9
Nevertheless, as discussed in [266], a perceptually exact band-limited interpolation can be implemented at a reasonable cost in the time domain, provided some amount of oversampling is used. Oversampling in the time domain provides a guard band in the frequency domain, which enables the interpolation kernel to meet perceptually ideal specifications at a much smaller length [270].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...spectsamps.3.10
In particular, zero-padding does not increase the resolution of an FFT. This is a surprisingly common point of misunderstanding (or sometimes just mislabeling).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... work.3.11
One could say the Blackman window is well matched to ``analog synthesizer quality'' levels, where a 60 dB signal-to-noise ratio is common.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... function|textbf:4.1
Note that writing $ \hbox{asinc}$ to denote the aliased sinc function is not standard practice in signal processing--consider it proposed notation.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
....4.2
In more detail, start with the ``physical'' definition of $ \hbox{asinc}$ :

$\displaystyle \hbox{asinc}_M(\omega T) = \frac{\sin(M \omega T / 2)}{M \sin( \omega T/2)}$ (4.9)

Now replace $ MT$ by $ \tau$ in the numerator. Take the limit as $ T$ goes to zero, with $ \tau$ remaining fixed, so that $ M$ goes to infinity (maintaining the relation $ M T = \tau$ ). When $ T$ gets very small, the denominator becomes

$\displaystyle M \sin( \omega T/2) \to M \omega T/2 = \omega \tau /2,$ (4.10)

which completes the proof.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... crossings.4.3
$ \Omega_M$ is the radian-frequency sampling interval for a length $ M$ DFT. Using $ \Omega_M$ to denote the sampling interval along $ \omega$ is analogous to using $ T$ to denote the sampling interval along time $ t$ -- hence the choice of symbol $ \Omega$ .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...Harris78.4.4
The Hamming window can also be derived as a special case of windows having a maximized main-lobe peak $ W(0)$ over all windows of the same energy and prescribed first zero-crossings about the main lobe [202, p. 239,403].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...Harris78.4.5
Note that the $ -42.76$ dB figure is for large window-length $ M$ . For small window lengths, the side-lobe levels increase. This phenomenon can be understood in terms of aliasing of the side-lobes of the continuous Hamming window which must be sampled to obtain a discrete-time window.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... dB,4.6
For larger window-length $ M$ , more than $ -41$ dB side-lobe suppression can be achieved, such as the $ -42.76$ value cited in [101].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... Hann4.7
The precise side-lobe level is dependent on window length $ M$ , but $ -41$ to $ -42$ dB is typical for the Hamming window.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...Papoulis.4.8
The proof that (3.36) is maximized appears on p. 210 of [202].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
....4.9
From a linear algebra point of view, consider the sinc kernel as corresponding to a Toeplitz Hermitian matrix. It is well known that Hermitian matrices have real eigenvalues and orthogonal eigenvectors. Also, multiplication by a Toeplitz matrix corresponds to convolution (in this case, a non-causal convolution.)
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... Box4.10
For Octave, the original version by Eric Breitenberger is still available on the Web, as of this writing, at
http://pangea.stanford.edu/Oceans/GES290/Breitenberger-SSAMatlab/mtm/.
Note, however, that the calling arguments after the first two are differently defined. A simple version written by the author appears in §F.1.2.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... by4.11
For small $ M$ and/or $ \beta $ , the closed-form transform diverges from the DTFT of the discrete-time Kaiser window due to aliasing.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... kind:4.12
The Maclaurin series for $ I_0(x)$ can be obtained as the term-by-term square of that for $ \exp(x/2)$ , since

$\displaystyle e^{\frac{x}{2}} = \sum_{k=0}^{\infty}\frac{\left(\frac{x}{2}\right)^k}{k!}.$ (4.41)

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... radians-per-second).4.13
In [101], $ \beta = \pi\alpha$ is described as half of the time-bandwidth product, which in turn is not defined. Factors of 2 often come and go because, e.g., the frequency band $ [-\omega_c,\omega_c]$ is often considered a bandwidth of $ \omega_c$ (neglecting negative frequencies).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
....4.14
The causal version may be computed as the inverse DFT of $ (-1)^k W(\omega_k)$ .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... itself.4.15
The not-so-smooth function $ 1/\sqrt{\left\vert t\right\vert}$ also transforms to itself [150, p. 47]. Also, a periodic impulse train transforms to an impulse train with reciprocal spacing of the impulses (see §B.14). Finally, as discussed at http://www.dsprelated.com/showarticle/45.php, if $ f(t)$ is real and even, then $ g(t)=f(t)+F(t)$ is its own transform under the normalized Fourier transform. (The normalized Fourier transform is the usual Fourier transform divided by $ \sqrt{2\pi}$ , which results in the inverse transform having the same scale factor.) That is, $ g(t) = f(t)+F(t)\;\longleftrightarrow\;F(\omega)+f(-\omega) = g(\omega)$ . We used the fact that if $ f(t)$ is real and even, then so is $ F(\omega)$ . (This is shown for the DTFT case in §2.3.3, and the proof is analogous in the continuous-time case.) Note that for $ g(t)$ to be smooth (differentiable of all orders), it cannot be bandlimited in either time or frequency.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... design.4.16
A spectrum-analysis window may be regarded as an FIR lowpass filter having an extremely narrow pass-band.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... CPUs5.1
In recent years, Graphics Processing Units (GPUs) have been used increasingly for high-performance general-purpose digital signal processing [242]. For example, the NVIDIA CUDA development environment allows harnessing of many parallel threads of execution in a GPU (typically on a graphics card or chip-set in a personal computer) from a C program using GPU-related library extensions. Due to the massive parallelism available in a GPU, FFT convolution becomes competitive with time-domain convolution only at much longer lengths (in the thousands) [242]. In a CPU (one processor), FFT methods typically win out for convolution lengths greater than 100 or so (see §8.1.4 for some details).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... frequencies.5.2
In this book, unless specified otherwise, all frequencies are normalized by the sampling rate. Thus, $ f_c\in(-1/2,1/2)$ is physically ``cycles per sample.''
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...optfirlp.5.3
In fact, optimal window design is a special case of optimal FIR lowpass design in which the desired pass-band width is nearly zero, as we will see.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... audible.5.4
A 0.1 dB pass-band ripple results in a normally inaudible amplitude error, while a -60 dB stop-band ripple can be quite audible if the pass-band is relatively quiet and high signal energy appears in the stop-band.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... frequency)5.5
It is a common matlab convention to specify normalized frequencies between 0 and 1 such that 1 corresponds to half the sampling rate, instead of the sampling rate. Thus, one-fourth the sampling rate ( $ \omega_p=\pi/2$ ) is specified as $ 0.5$ in matlab. In this book, outside of Matlab function arguments, normalized frequency $ 1$ is always the sampling rate.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
....5.6
As a result of the implicit equal weighting, the stop-band ripple at $ \approx -62$ dB means that the pass-band ripple is below 0.007 dB, which is overkill for typical audio applications. This pass-band ripple can be enlarged in exchange for improved stop-band ripple or a narrower transition band or both. This is done by specifying a larger weighting on the stop-band ripple--see the help page for firpm for details.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... design5.7
Recall that window design can be formulated as FIR lowpass-filter design with a zero-with (or nearly zero-width) pass-band.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... ``tight''.5.8
Our choice came from $ f_1/f_s = 0.024$ as used in [228].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...RabinerAndGold.5.9
Show this as an exercise. [Hint: Recall the stretch theorem2.3.11).]
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... response.5.10
It is effective in practice to try doubling the FFT size to see if it appreciably changes the designed filter--it should not.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... consequences.5.11
However, consider an FM signal in which a sinusoid is sweeping back and forth in the pass-band. In that case, the pass-band-ripple generates AM sidebands, so a spec more like that in the stop-band may be called for. Here, we allow the pass-band ripple to be 10 times the stop-band ripple, which is a reasonable compromise.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... gains.5.12
While cubic splines are maximally smooth in a precise physical sense, they are not band-limited, so one can do better by using band-limited interpolation of the desired frequency-response points. (In this situation, ``band-limited'' equals ``time-limited,'' which is exactly what we are going for in an FIR filter design.)
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... Octave).5.13
At the time of this writing, remez is a C-compiled extension (.oct file) in the octave-signal package for Linux (Red Hat Fedora 16), but not in the package of the same name under MacPorts for Mac OS X. To create a new C-compiled extension, the original Fortran listing from [66,224], e.g., http://ccrma.stanford.edu/~jos/sasp/remez.f4 can be converted to C using f2c from http://www.netlib.org/f2c/.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
....5.14
Equivalently, (4.38) can be minimized by one step of Newton's method.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... (Hz).6.1
Long ago, the term for Hz was cycles per second (cps).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
....6.2
More generally, an analytic signal is obtained from a real signal by filtering out its negative-frequency components. In other terms, the imaginary part of the analytic signal may be obtained as the Hilbert transform of the real part (see §4.6).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... property6.3
The sifting property of delta functions $ \delta(t)$ provides that

$\displaystyle \int_{t=-\infty}^{\infty}f(t)\delta(t) = f(0)$ (6.5)

for every continuous function $ f(t)$ . We think of a delta function as having zero width and unit area (§B.10).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... frequency.6.4
This is not as confusing as one might think at first. When the frequency range is $ -\pi$ to $ \pi$ , normalized radian frequency is being used (radians per sample). When the range is $ -1/2$ to $ 1/2$ , it is normalized frequency (cycles per sample). The unnormalized case (true physical radian frequency in radians per second) usually only arises in applications.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... moment.6.5
See §B.17 for an example of this regarding the uncertainty principle.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
....6.6
A standard notation for fundamental frequency is ``$ F0$ '' (or $ F_0$ ). This comes from the speech analysis community, where usually $ F_1$ , $ F_2$ , and so on, refer to the formant frequencies (resonance peak frequencies) of the vocal tract. When not working with formants, it is convenient to define the fundamental frequency as $ f_1$ , so that the frequency of the $ k$ th harmonic is $ f_k = kf_1$ .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... periodic.6.7
Most plucked strings can be considered very nearly harmonic. Piano strings, however, are significantly stiff so that they exhibit audible inharmonicity--the partial overtone series is stretched [77,267].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... exactly.6.8
One situation in which minimum orthogonal spacing works well is when the signal is known to be exactly periodic, and the period is accurately measured using a fundamental-frequency estimator (§10.1). In this case, we can resample the periodic signal to obtain an exact integer number of samples per period, and a rectangular window can be set to exactly one period in length. In this situation, each DFT coefficient is proportional to a Fourier series coefficient (defined in Chapter 2), and the peak frequencies are known to be integer multiples of the fundamental frequency, so no peak interpolation is needed at all. In other words, the fundamental frequency estimator takes care of locating all the peaks in frequency: Resampling to an integer period in the time domain corresponds to resampling the spectrum at each main-lobe peak in the frequency domain.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... bins6.9
Here we mean fractional bins when $ p$ is not an integer.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... applications.6.10
A tuning error of $ 0.1$ % is about ``two cents'', where a cent is defined as a hundredth of a semitone, or $ 2^{1/1200}-1 \approx 0.058\%$ . Most people cannot detect tuning errors of only two cents, unless some kind of interference effect is involved, in which case the frequency error translates to a slowly modulated amplitude envelope.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... line.6.11
We encountered least-squares optimization for FIR filter design in §4.10.3.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... distributed6.12
The Gaussian distribution is also called the normal distribution, or ``bell curve.'' By the ``central limit theorem'' (§D.9.1), any sum of independent random variables becomes Gaussian in the limit. Therefore, filtered noise is usually well modeled as Gaussian, since the filtering typically adds many random variables together. See Appendix D for more about the Gaussian distribution.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... by6.13
The real and imaginary parts of $ v(n) = v_r + j v_i$ are independently distributed according to the more familiar Gaussian density function

$\displaystyle p_{v_r}(\nu)=p_{v_i}(\nu) = \frac{1}{\sqrt{\pi 2\sigma_{v_r}^2}} e^{-\frac{\nu^2}{2\sigma_{v_r}^2}}$ (6.45)

where $ \sigma_{v_r}^2 =
\sigma_{v_i}^2 = \sigma_{v}^2/2$ .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... autocorrelation,7.1
Note that there are many possible biased estimates of the true autocorrelation function. However, we will consider only one of them in this book.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... ``complicated''.7.2
An interesting discussion of the meaning of randomness is given in Knuth [131, vol. II].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... function.7.3
In Octave, it is necessary to install the add-on package octave-forge to obtain this and other signal processing functions.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
....7.4
Note that we are assuming $ v$ is zero mean. Otherwise, the sample variance would be defined with the mean subtracted out, as discussed further in §C.1.10. When the mean is zero, a correlation may be called a covariance.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... mean.7.5
Note that we are not averaging the PSD to get the total variance, but instead summing it. This is why the $ 2\pi$ factor in the IDFT above is associated with $ d\omega$ and why the IDFT is also written as an integral with respect to a Hertz frequency axis.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... data.7.6
A trend is typically estimated using linear regression. That is, a straight line is fit through the data in a least squares sense. (See the function polyfit in Matlab or Octave.)
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...Kay88:7.7
The division by $ M$ can be seen as part of the implicit autocorrelation calculation. It normalizes the peak of the implicit Bartlett window on the autocorrelation to 1, as discussed further below. Alternatively, it may be interpreted as a normalization of the Fourier transform itself, converting a power spectrum (squared-magnitude FFT) to a normalized power spectrum (NDFT). Normalization is needed for stationary random processes since they generally have infinite signal energy but finite average power; i.e., $ \max(\left\vert\hbox{\sc DTFT}(x_w)\right\vert^2)$ grows without bound as $ M$ increases.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... correlated.7.8
An exception is when white noise is filtered using an allpass filter, in which case the output signal is still white noise.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... noise|textbf.7.9
For a more formal development, see the Wold decomposition theorem.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... noise7.10
The term ``pink noise'' indicates that the spectrum is more intense at low frequencies than at high frequencies. This makes sense since the color pink is heavier in the red end of the spectrum compared with white which balances all colors equally.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...VossAndClarke78,7.11
Physical phenomena exhibiting a $ 1/f$ power spectral density law include noise in vacuum tubes, carbon resistors, transistor junctions, metal films, ionic solutions, films at the superconducting transition, Josephson junctions, nerve membranes, cosmic background radiation distribution, sunspot activity, and the flood levels of the river Nile [294]. In addition, the sub-audio short-time power fluctuations in music (i.e., below the audible frequencies starting at 20 Hz) have been shown to follow the $ 1/f$ characteristic, especially classical music [294].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... correctly.8.1
In the Matlab Signal Processing Toolbox, the argument 'periodic' should be included when creating the window.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... integer.8.2
Actually, non-integer $ R/k$ can be accommodated by rotating among a set of windows obtained by sampling the underlying continuous window at different phases.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... limited.8.3
This is of course the Fourier dual of saying that the uniform sampling of a time-domain signal is information-preserving provided the signal is properly bandlimited (in the frequency domain).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... magnitude.8.4
The spectrogram is often called a sonogram when applied to audio signals.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... STFT.8.5
Perfect reconstruction is also possible in principle using $ R$ as large as $ M$ with the Hamming window. However, this requires dividing out the amplitude modulation given by the sum of Hamming windows displaced by $ R$ (see Eq.(7.2)). In practice, $ R=(M-1)/2$ (50% overlap) is the largest hop size used with the Hamming window because it is the largest value that preserves the constant-overlap-add (COLA) property. We will learn in Chapter 9 that $ R=(M-1)/4$ (75% overlap) is significantly more robust than 50% overlap, and is recommended when spectral modifications are to be carried out on the STFT data.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... SPL.8.6
A listening-level slider would be nice to have in the Graphical User Interface (GUI) for a loudness spectrogram.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... bank.8.7
Note that the FFTs are effectively downsampled by this operation, with the highest ``frequency-domain sampling rate'' occurring at the lowest frequency of the band. Therefore, the FFT length can be set by matching the adjacent auditory filter spacing to the low-frequency bin spacing of the FFT at the lower edge of the frequency range covered by that FFT). In fact, one very large FFT could be used in which the low-frequency bin spacing is approximately equal to the spacing of the center-frequencies of the auditory filter-bank channels at the low-frequency extreme.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... level.8.8
Downloading http://ccrma.stanford.edu/~jos/sasp/hw/SteveJobsHi.wav and listening at a very low level (approximately 20 dB SPL) verifies that indeed this sound example sounds like ``Hi...ee-jah,'' in qualitative agreement with the sone loudness curve in Fig.7.9.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... constant.8.9
Envelope followers in sound processing classically behave this way as well [109]. The amplitude envelope is allowed to increase instantaneously, but it floats down with some time constant that can be adjusted.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...MDFT,9.1
https://ccrma.stanford.edu/~jos/mdft/Convolution_Theorem.html
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...).9.2
These results were obtained using the fft function in Matlab v5.2 running on a Windows PC with an 800MHz Athlon CPU.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... 2011,9.3
Octave on a Fedora 15 64-bit Linux machine (built in 2009) with an Intel Core i7-860 quad-core CPU running at 2.8 GHz
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...MDFT9.4
https://ccrma.stanford.edu/~jos/mdft/Fast_Fourier_Transform_FFT.html
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... filter.9.5
As discussed in [261, Chapter 11], an FIR filter having impulse response $ h(n)$ is said to be linear phase when its impulse response is symmetric about some point in time, e.g., $ h(N-n-1)=h(n)$ , for $ n=0:N-1$ , where $ N$ is the length of the FIR filter.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... (COLA)9.6
The acronym COLA is not standard in signal processing, although OLA might be recognized by many. When writing a paper, acronyms should always be spelled out on first use, even for surely recognized acronyms such as ``FFT''.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... filter|textbf.10.1
Low-power applications such as RFID chips use cascaded integrated comb filters (CIC filters) in their sampling-rate converters.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... filter|textbf.10.2
In ordinary sampling theory [270], each sample of a time-domain signal determines the scaling and location of a sinc function for all time in the underlying continuous-time signal represented by the samples. The dc sampling filter described here is the Fourier dual of the time-domain sinc function corresponding to a single sample in time.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... demodulation|textbf.10.3
We use the term ``demodulation'' when frequencies are translated from high to low ($ \omega_c$ to 0 in this case).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...).10.4
We also implicitly assumed that the DFT size $ N$ was not smaller than the window length $ M$ .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... modifications.10.5
The term FBS modifications refers to changing the gain and/or phase of the time-domain signal coming out of a filter-bank channel. This is distinct from OLA modifications in which a spectrum is altered, inverse transformed, and overlap-added into an output buffer. Multiplicative OLA modifications are exact (no aliasing) when the zero-padding in the time domain is sufficient. FBS modifications are not provided zero-padding in the time domain, and for $ R>1$ there is aliasing in the channel signals.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... algorithm11.1
This algorithm was developed by the author circa 1996 for vibrating-string fundamental frequency measurement. It has been found to work quite well on the middle third (or so) of plucked string recordings. An extension to stiff strings (stretched partial overtone frequencies) is described in [266].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...MuellerEtAl11,KlapuriMohonk05,KlapuriSAP03,Klapuri01.11.2
Klapuri's publications home page as of this writing:
http://www.elec.qmul.ac.uk/people/anssik/publications.htm
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... retrieval|textbf11.3
For an introduction to MIR, see, e.g., recent proceedings of the International Conference on Music Information Retrieval (ISMIR) or Music Information Retrieval Evaluation eXchange (MIREX) conference.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... construction,11.4
A matrix is said to be Toeplitz if the $ [i,j$ ]th entry can be expressed as a function of $ i-j$ . A Toeplitz matrix is constant along all diagonals.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... recursion.11.5
http://ccrma.stanford.edu/~jos/filters/Computing_Reflection_Coefficients.html
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... hearing:11.6
Due to nonlinearities in hearing [179,306], it is not always valid to truncate the summation at the high-frequency hearing limit. For complete generally, $ \Omega$ should be extended to the highest frequency present in the signal $ s(t)$ , since inaudible frequencies can give rise to audible components at the output of a nonlinearity.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... overtone.11.7
The term overtone or partial overtone is generally used to mean a sinusoidal component which is not harmonically related to the fundamental frequency.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... ways.11.8
Frequency modulation is the time-derivative of phase modulation.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...McAulayQuatieri86.11.9
An interesting avenue for future research is the pursuit of new spectral-modeling primitives and operators which are generally useful and compact for modeling important aspects of sound.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...SmithSerra.11.10
Higher-order interpolations of so-called envelope break-points were also developed at CCRMA in the late 1970s (e.g., using cubic splines), but for tonal sounds, linear interpolation is usually sufficient, and the higher-order envelopes did not see much use, presumably due to the greater complexity of dealing with them coupled with the lack of significant benefit.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... STFT|textbf,11.11
Extension to multiresolution FFT analysis was an important step in obtaining artifact-free analysis and resynthesis of polyphonic audio sources. Previously, sinusoidal and S+N modeling had been confined to monophonic sources, such as voice, or a single instrument, etc.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... band11.12
See Appendix E for a definition of Bark bands (classical critical bands of hearing).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...Ellis02-pvoc,HejnaAndMusicus9111.13
Synchronous Overlap-Add, Fixed Synthesis [104]--a time-domain method reminiscent of the Eventide ``Harmonizer,'' based on an overlap-add decomposition of the time waveform, with input windows shifted and output windows regularly spaced for a fixed output synthesis window-rate. SOLA-FS is said to be more computationally efficient than SOLA [240,280].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... bank|textbf.11.14
In Beranek's Acoustics [15, pp. 333-334], the ``one-third octave-band analyzer'' is defined as the 25-band filter bank having the spectral partition points (in Hz) [20, 45, 57, 71, 90, 114, 142, 180, 228, 284, 360, 456, 568, 720, 912, 1136, 1440, 1824, 2272,
2880, 3648, 4544, 5760, 7296, 9088, 11520].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... bank|textbf,11.15
An ``octave-band analyzer'' is defined in [15, p. 333] as the 8-band filter bank having the spectral partition points [37.5, 75, 150, 300, 600, 1200, 2400, 4800, 10000] Hz.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... channels.11.16
Thanks to Jeurgen Herre for mentioning this reference.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...dtftalias):11.17
A more efficient implementation can use reshape and sum.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... general.11.18
When the FFT window is a length N rectangular window, then alias(Hk .* X, Nk) == BandK, as defined above, and there is no aliasing after all. More precisely, the aliased spectral samples all happen to be zeros of the window transform (which is an aliased sinc function, as defined in §3.1). These zeros depend on the window-length being N (no zero-padding), and on the window-type being rectangular (``no window'').
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... bank.11.19
Do not confuse the Octave program--a free, open-source implementation of the matlab language--with the musical octave: a frequency interval spanning a factor of two.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... domain.11.20
The samples are connected by straight lines to make them visible. The true responses for the left two bands are aliased sinc functions (asinc). The next octave up is a sum of two asincs, and the rightmost band (top octave) is a sum of four asincs. A properly interpolated frequency response for this filter bank is shown in Fig.10.33.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...dcells11.21
http://ccrma.stanford.edu/~jos/sasp/dcells.m
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...dcells2spec11.22
http://ccrma.stanford.edu/~jos/sasp/dcells2spec.m
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... wide.11.23
We do not really have to restrict consideration to powers of two, as there are many fast Fourier transform algorithms for various composite and prime lengths [80,289].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... samples''11.24
Spectral samples are defined here as ``bin numbers plus 1'', that is, spectral samples are numbered from 1 as in matlab, rather than from 0, as in the signal processing literature.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... website.11.25
http://ccrma.stanford.edu/~jos/pdf/SMS.pdf
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
....12.1
We will relax the restriction $ M\le N$ in the next section. The only reason for the restriction now is to avoid a different system definition for $ M>N$ relative to what we'll derive in the next section.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
....12.2
The diagram-manipulation derivation in §11.1.3 would produce $ M$ subphase branches, each scaled by $ h(m)$ , while here we have only $ N$ subphase branches, each feeding a length $ L$ FIR filter.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... transients.12.3
The Dolby AC-3 perceptual audio coding format, which is formulated more directly as a transform coder (quantized STFT), switches to a shorter FFT window when transients are detected in the signal being encoded. The original Dolby AC-2 format used length 512 FFT windows in a Princen-Bradley time-domain aliasing cancellation scheme (sampling rate typically 44.1 kHz). The shorter length for transients in AC-3 was chosen to be 256 samples, or half the steady-state length [149, §4.1.4]. A special hybrid window is needed for a smooth transition from steady-state to transient processing, or vice versa.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... level.12.4
One careful study found that 96-kbps AAC is roughly equivalent to 128-kbps MP3, which is a 33% lower bitrate at roughly the same quality level. [149, §4.1.8].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... transform|textbf.12.5
The $ Q$ of a bandpass filter may be defined as its center-frequency divided by its bandwidth [263].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... bank|textbf,12.6
The term ``dyad'' simply means ``two'', as in monad, dyad, triad, tetrad, $ \ldots$ . Thus, successive bands in a dyadic filter bank are obtained using a frequency-scale factor of two.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... transforms:B.1
See §8.3.1 for the discrete-time case.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...MDFT,B.2
http://ccrma.stanford.edu/~jos/mdft/Cauchy_Schwarz_Inequality.html
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... filter.B.3
An allpass filter has unity gain and arbitrary delay at each frequency.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... Therefore,B.4
Technically, the Fourier transform of the unit step function $ u(t)$ does not exist, since $ \vert u(t)\vert^p$ is not integrable for any value of $ p$ . However, its Laplace transform $ U(s) = 1/s$ does exist in the right-half $ s$ plane, and the limit as $ s\to j\omega$ is well behaved and can be taken as the definition of the Fourier transform. The same construction works for $ t\cdot u(t)$ , $ t^2\cdot u(t)$ , and so on.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... orderB.5
We will say that a function $ W(\omega)$ is of order $ 1/\omega^{n+1}$ if there exists $ \omega_0$ and some positive constant $ M<\infty$ such that $ \left\vert W(\omega)\right\vert<M/w^{n+1}$ for all $ \omega > \omega_0$ .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
....B.6
Such a decomposition may be constructed by differentiating to obtain $ w^\prime (t)$ and defining

$\displaystyle w^\prime_{\scriptscriptstyle\uparrow}(t)\isdef \left\{\begin{array}{ll} w^\prime , & w^\prime \geq 0 \\ [5pt] 0, & w^\prime <0 \\ \end{array} \right.$ (B.81)

and similarly for $ w^\prime_{\scriptscriptstyle\downarrow}(t)$ . (The derivatives may include impulses corresponding to discontinuities in $ w(t)$ .)

The quantity $ [w_{\scriptscriptstyle\uparrow}(b)-w_{\scriptscriptstyle\uparrow}(a)] + [w_{\scriptscriptstyle\downarrow}(b)-w_{\scriptscriptstyle\downarrow}(a)]$ is called the total variation of $ w$ on $ (a,b)$ ; if this value is finite, then $ w(t)$ is said to be of bounded variation on $ (a,b)$ .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...impulse.C.1
The impulse $ \delta(x)$ may be defined as any ``function'' $ f$ for which

$\displaystyle \int_{-\infty}^\infty f(x)\delta(x)dx = f(0)$ (C.10)

where $ f(x)$ is assumed continuous at $ x=0$ . A typical definition is

$\displaystyle \delta(x) \isdef \lim_{\Delta \to 0} \left\{\begin{array}{ll} \frac{1}{\Delta}, & 0\leq x\leq \Delta \\ [5pt] 0, & \hbox{otherwise}. \\ \end{array} \right.$ (C.11)

The impulse was introduced in Chapter 2 starting at §B.10. See also [264,36,150] for further development.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... process.C.2
The general class of stochastic processes for which time averages equal ensemble averages is called ergodic stochastic processes [201]. In this book, all stochastic processes are assumed ergodic (and hence stationary) because they can all be modeled as filtered white noise, where the filter is stable, linear, and (at least approximately over short durations) time-invariant. Furthermore, the driving noise can be chosen to be Gaussian; see more advanced texts on stochastic processes regarding distinctions that can arise in non-ergodic cases.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... length.C.3
Note that 20 ms contains only one period of a sinusoid at 50 Hz, which is above lower limit of pitch perception (the low note of the piano, A0, is tuned to 22 Hz). It is therefore possible to encounter difficulty resolving tones in the deep bass region of the audio spectrum. A 20 ms frame length works quite well, however, for telephone speech processing, in which the nominal bandwidth is 200-3200 Hz; in this case, a 20ms frame has at least four periods of the lowest frequency present, and harmonic resolution is assured under the Hamming window. In wideband audio work, a multiresolution analysis is often highly preferable.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... numbers.C.4
Two random events $ A$ and $ B$ are said to be independent if the probability of event $ A$ and $ B$ occurring together equals the product of the probability of event $ A$ times the probability of event $ B$ . Similarly, two random variables $ x$ and $ y$ are said to be independent if the probability that both $ x=a$ and $ y=b$ equals the probability that $ x=a$ times the probability that $ y=b$ , where $ a$ and $ b$ are any values that the respective random variables can assume. For purposes of this book, it is sufficient to have only an intuitive understanding of terms such as these from probability theory. Only sample correlations will be needed for noise spectrum analysis.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...diffthm).D.1
This approach to the proof was discovered on the Web, for the real case, at
http://www.ph.tn.tudelft.nl/~lucas/education/tn254/2002/Fourier%20transform%20of%20a%20Gaussian.pdf
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... theoremD.2
http://mathworld.wolfram.com/CentralLimitTheorem.html
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... increases.D.3
This can be shown as a consequence of the central limit theorem. Coincidentally, the Gaussian function is believed to have first appeared for approximating large binomial coefficients in the 2nd edition (1718) of Doctrine of Chances by Abraham de Moivre (a book on probability).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...,D.4
http://mathworld.wolfram.com/NormalDistribution.html
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... BarksE.1
The normalized warped-frequency interval $ \omega\in[0,\pi]$ was converted to Barks $ b$ by the affine transformation $ b = (\omega/\pi)*(N_b-1)+0.5$ , where $ N_b$ is the number of Bark bands in use. For example, $ N_b=25$ for a $ 31$ kHz sampling rate.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...JOST:E.2
Matlab functions bark2lin.m and lin2bark.m for transforming between linear and bark-warped frequency representations are available on the internet at http://ccrma.stanford.edu/~jos/bbt/bbt.html.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...Darrigol07.G.1
For an online biography of Daniel Bernoulli, see also
http://www-groups.dcs.st-and.ac.uk/~history/Biographies/Bernoulli_Daniel.html
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...G.2
This was before Fourier theory (1822), and before differential equations as we now know them. Calculus using trigonometric functions was still a new topic at the time. D'Alembert derived his results from Newton's second law applied to a differential string element, integrating the result, and imposing rigid boundary constraints.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... loudspeakers).G.3
Of course, we know now that the sound of a vibrating string is produced mainly by the bridge excitation, which is driven by a superposition of the vibrating string modes, as envisioned by Bernoulli, or, in an equally valid alternative formulation, by the traveling waves of D'Alembert.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... masks.G.4
See also the fascinating history of ``optical synthesis'' at http://www.umatic.nl/tonewheels_historical.html, including the photos of Arseny Avraamov's hand-drawn motion picture soundtracks (1930) and Oskar Fischinger's ``sound scrolls'' (1932). Also, the Theremin Center has photos of Boris Yankovsky's ``vibroexponator'' soundtracks using syntones and spectral mutations (http://www.theremin.ru/archive/vibroexponator.htm).
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... Hz)G.5
According to the voder patent (US Patent 2,121,142), the ten bandpass edges were, in Hz, 0, 225, 450, 700, 1000, 1400, 2000, 2700, 3800, 5400, and 7500. The voder patent refers to a vocoder patent application, serial number 47,393, filed October 30, 1935, but it appears that the vocoder patent may have never issued. In any case, Figure 1 of the voder patent is described as ``similar'' to that of the vocoder, and having ``been used as the natural development from the analyzer-synthesizer circuit referred to in my previous application, Serial No. 47,393'' and the improved version (Figure 6 of the voder patent) specified the same ten bandlimits.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... technicians.G.6
For a photo, see http://davidszondy.com/future/robot/voder.htm. See also the Smithsonian Speech Synthesis History Project:
https://amhistory.si.edu/archives/speechsynthesis/ss_home.htm
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... domain.G.7
Dolson and Laroche have extended this idea to the processing of nonparametric spectral peaks in the short-time spectrum [143,142,139].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...G.8
In practice, one normally cross-fades among tables designed for different pitch ranges, as described, e.g., at http://swiki.hfbk-hamburg.de/MusicTechnology/878
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...G.9
In practical additive synthesis, each harmonic overtone, or harmonic overtone group, may be synthesized using a wavetable oscillatorG.8.4) [165,27,193]. When the number of overtones is large, the inverse-FFT method is typically used [239].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... signals.G.10
An early SAIL program reduce.sai for adaptively allocating piecewise-linear breakpoints is available online at http://ccrma.stanford.edu/~jos/sasp/reduce.sai
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... convolution).G.11
A model typically means parametric model. In contrast, a nonparametric representation, such as an FFT, is regarded as data or processed data--not a model. The number of data points (``degrees of freedom'') in a nonparametric model is typically on the same order as that of the original data. True model parameters, on the other hand, will have far fewer degrees of freedom than the original data.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... 1977,G.12
This is a personal observation based on using the software written by James A. Moorer running on the PDP KL-10 soon after I arrived at CCRMA. (At that time, ``CCRMA'' was an unnamed computer-music project at the Stanford Artificial Intelligence Lab). From reading the code, it appeared to be an implementation along the lines of Portnoff's 1976 paper. As far as I know, prior additive-synthesis analysis was performed by the heterodyne-comb technique [185], which is related to Goetzel's algorithm for computing a single bin of the DFT recursively.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... transient.G.13
The noise component cannot be used, in general, because no matter how much resolution is provided in the amplitude envelope of the noise, there is usually no guarantee that the noise, being random, will have the desired amplitude at the critical time it is needed.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
...PARSHLH.1
PARSHL was so named because it could follow partials (as opposed to merely harmonics). Being written for the PDP-10 computer running the SAIL operating system, the filename was restricted to 6 characters, so that ``partial'' became ``PARSHL''.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... phaseH.2
The version written in 1985 did not support phase. Phase support was added much later by the second author of [271] in the context of his Ph.D. research, using the phase interpolation algorithm of McAulay and Quatieri [174].
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
... forever,H.3
We tried reusing turned-off oscillators but found them to be more trouble than they were worth in our environment.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.