Python code for
MATHEMATICS OF THE DISCRETE FOURIER TRANSFORM (DFT) WITH AUDIO APPLICATIONS

SECOND EDITION

JULIUS O. SMITH III
Center for Computer Research in Music and Acoustics (CCRMA)

Python Code by¶

Marina Bosi & Rich Goldberg
Center for Computer Research in Music and Acoustics (CCRMA)

Example Applications of the DFT ¶

Spectrum Analysis of a Sinusoid: Windowing, Zero-Padding, and FFT ¶

FFT of a Simple Sinusoid ¶

Our first example is an FFT of the simple sinusoid

$\displaystyle x(n) = \cos(\omega_x n T) $

where we choose $ \omega_x=2\pi(f_s/4)$ (frequency $ f_s/4$ Hz) and $ T=1$ (sampling rate $ f_s$ set to 1). Since we're using a Cooley-Tukey FFT, the signal length $ N$ should be a power of $ 2$ for fastest results. Here is the Matlab code:

% Example 1: FFT of a DFT-sinusoid

% Parameters:
N = 64;              % Must be a power of two
T = 1;               % Set sampling rate to 1
A = 1;               % Sinusoidal amplitude
phi = 0;             % Sinusoidal phase
f = 0.25;            % Frequency (cycles/sample)
n = [0:N-1];         % Discrete time axis
x = A*cos(2*pi*n*f*T+phi); % Sampled sinusoid
X = fft(x);          % Spectrum

% Plot time data:
figure(1);
subplot(3,1,1);
plot(n,x,'*k');        
ni = [0:.1:N-1];     % Interpolated time axis
hold on; 
plot(ni,A*cos(2*pi*ni*f*T+phi),'-k'); grid off;
title('Sinusoid at 1/4 the Sampling Rate');
xlabel('Time (samples)'); 
ylabel('Amplitude');
text(-8,1,'a)');
hold off; 

% Plot spectral magnitude:
magX = abs(X);
fn = [0:1/N:1-1/N];  % Normalized frequency axis
subplot(3,1,2);
stem(fn,magX,'ok'); grid on;
xlabel('Normalized Frequency (cycles per sample))'); 
ylabel('Magnitude (Linear)');
text(-.11,40,'b)');

% Same thing on a dB scale:
spec = 20*log10(magX); % Spectral magnitude in dB
subplot(3,1,3);
plot(fn,spec,'--ok'); grid on;
axis([0 1 -350 50]);
xlabel('Normalized Frequency (cycles per sample))'); 
ylabel('Magnitude (dB)');
text(-.11,50,'c)');
cmd = ['print -deps ', '../eps/example1.eps'];
disp(cmd); eval(cmd);

Figure 8.1: Sampled sinusoid at frequency $ f=f_s/4$ . a) Time waveform. b) Magnitude spectrum. c) DB magnitude spectrum. \includegraphics[width=\twidth]{eps/example1}

The results are shown in Fig.8.1. The time-domain signal is shown in the upper plot (Fig.8.1a), both in pseudo-continuous and sampled form. In the middle plot (Fig.8.1b), we see two peaks in the magnitude spectrum, each at magnitude $ 32$ on a linear scale, located at normalized frequencies $ f= 0.25$ and $ f= 0.75 = -0.25$ . A spectral peak amplitude of $ 32 = (1/2) 64$ is what we expect, since

$$\hbox{DFT}_k(\cos(\omega_x n)) \doteq \sum_{n=0}^{N-1} \frac{e^{j\omega_x n} + e^{-j\omega_x n}}{2} e^{-j\omega_k n}, $$

and when $ \omega_k=\pm\omega_x$ , this reduces to

$$\sum_{n=0}^{N-1}\frac{e^{j 0 n}}{2} = \frac{N}{2}. $$

For $ N=64$ and $ \omega_x=2\pi f_s/4$ , this happens at bin numbers $ k = 0.25 N = 16$ and $ k = 0.75N = 48$ . However, recall that array indexes in matlab start at $ 1$ , so that these peaks will really show up at indexes $ 17$ and $ 49$ in the magX array.

The spectrum should be exactly zero at the other bin numbers. How accurately this happens can be seen by looking on a dB scale, as shown in Fig.8.1c. We see that the spectral magnitude in the other bins is on the order of $ 300$ dB lower, which is close enough to zero for audio work $ (\stackrel{\mbox{.\,.}}{\smile})$ .

# Example 1: FFT of a DFT-sinusoid

import numpy as np
from numpy import pi, cos, log10
from numpy.fft import fft
import matplotlib.pyplot as plt

# Parameters:
N = 64              # Must be a power of two
T = 1               # Set sampling rate to 1
A = 1               # Sinusoidal amplitude
phi = 0             # Sinusoidal phase
f = 0.25            # Frequency (cycles/sample)
n = np.arange(N)    # Discrete time axis
x = A*cos(2*pi*n*f*T+phi) # Sampled sinusoid
X = fft(x)          # Spectrum

plt.figure(figsize=(10,10))

# Plot time data:
plt.subplot(3,1,1)
plt.plot(n,x,'*k')        
ni = np.arange(0,N,0.1)   # Interpolated time axis
plt.plot(ni,A*cos(2*pi*ni*f*T+phi),'-k') 
plt.xlim(0,N)
plt.title('Sinusoid at 1/4 the Sampling Rate')
plt.xlabel('Time (samples)') 
plt.ylabel('Amplitude')
plt.text(-.11*64,1,'a)')


# Plot spectral magnitude:
magX = abs(X)
fn = np.arange(0, 1, 1/N)  # Normalized frequency axis
plt.subplot(3,1,2)
plt.stem(fn,magX,'-ok', use_line_collection=True)
plt.grid()
plt.xlim(0,1)
plt.xlabel('Normalized Frequency (cycles per sample))') 
plt.ylabel('Magnitude (Linear)')
plt.text(-.11,30,'b)')

# Same thing on a dB scale:
spec = 20*log10(magX) # Spectral magnitude in dB
plt.subplot(3,1,3)
plt.plot(fn,spec,'--ok')
plt.grid()
plt.xlim(0,1)
plt.ylim(-350, 50)
#plt.axis([0 1 -350 50])
plt.xlabel('Normalized Frequency (cycles per sample))') 
plt.ylabel('Magnitude (dB)')
plt.text(-.11,0,'c)')

plt.show()

FFT of a Not-So-Simple Sinusoid¶

Now let's increase the frequency in the above example by one-half of a bin:

% Example 2 = Example 1 with frequency between bins

f = 0.25 + 0.5/N;   % Move frequency up 1/2 bin

x = cos(2*pi*n*f*T); % Signal to analyze
X = fft(x);          % Spectrum
...                  % See Example 1 for plots and such

Figure 8.2: Sinusoid at Frequency $ f=0.25+0.5/N$ . a) Time waveform. b) Magnitude spectrum. c) DB magnitude spectrum.

The resulting magnitude spectrum is shown in Fig.8.2b and c. At this frequency, we get extensive "spectral leakage" into all the bins. To get an idea of where this is coming from, let's look at the periodic extension (§7.1.2) of the time waveform:

% Plot the periodic extension of the time-domain signal
plot([x,x],'--ok');
title('Time Waveform Repeated Once');
xlabel('Time (samples)'); ylabel('Amplitude');

Note the "glitch" in the middle where the signal begins its forced repetition. Figure 8.3: Time waveform repeated to show discontinuity introduced by periodic extension (see midpoint).

# Example 2 = Example 1 with frequency between bins

import numpy as np
from numpy import pi, cos, log10
from numpy.fft import fft
import matplotlib.pyplot as plt

# Parameters:
N = 64              # Must be a power of two
T = 1               # Set sampling rate to 1
A = 1               # Sinusoidal amplitude
phi = 0             # Sinusoidal phase
#f = 0.25            # Frequency (cycles/sample)
n = np.arange(N)    # Discrete time axis
#x = A*cos(2*pi*n*f*T+phi) # Sampled sinusoid
#X = fft(x)          # Spectrum
f = 0.25 + 0.5/N;    # Move frequency up 1/2 bin
x = cos(2*pi*n*f*T); # Signal to analyze
X = fft(x);          # Spectrum

plt.figure(figsize=(10,10))

# Plot time data:
plt.subplot(3,1,1)
plt.plot(n,x,'*k')        
ni = np.arange(0,N,0.1)   # Interpolated time axis
plt.plot(ni,A*cos(2*pi*ni*f*T+phi),'-k') 
plt.xlim(0,64)
plt.title('Sinusoid tuned NEAR 1/4 the Sampling Rate')
plt.xlabel('Time (samples)') 
plt.ylabel('Amplitude')
plt.text(-.11*64,1,'a)')


# Plot spectral magnitude:
magX = abs(X)
fn = np.arange(0, 1, 1/N)  # Normalized frequency axis
plt.subplot(3,1,2)
plt.stem(fn,magX,'-ok', use_line_collection=True)
plt.grid()
plt.xlim(0,1)
plt.xlabel('Normalized Frequency (cycles per sample))') 
plt.ylabel('Magnitude (Linear)')
plt.text(-.11,20,'b)')

# Same thing on a dB scale:
spec = 20*log10(magX) # Spectral magnitude in dB
plt.subplot(3,1,3)
plt.plot(fn,spec,'--ok')
plt.grid()
plt.xlim(0,1)
plt.ylim(-10, 30)
plt.xlabel('Normalized Frequency (cycles per sample))') 
plt.ylabel('Magnitude (dB)')
plt.text(-.11,30,'c)')

plt.show()

# Plot the periodic extension of the time-domain signal
plt.figure(figsize=(10,5))
plt.plot(np.concatenate([x,x]),'--ok')
plt.title('Time Waveform Repeated Once')
plt.xlabel('Time (samples)'); plt.ylabel('Amplitude')
plt.show()

FFT of a Zero-Padded Sinusoid¶

Looking back at Fig.8.2c, we see there are no negative dB values. Could this be right? Could the spectral magnitude at all frequencies be 1 or greater? The answer is no. To better see the true spectrum, let's use zero padding in the time domain (§7.2.7) to give ideal interpolation (§7.4.12) in the frequency domain:

zpf = 8;            % zero-padding factor
x = [cos(2*pi*n*f*T),zeros(1,(zpf-1)*N)]; % zero-padded 
X = fft(x);         % interpolated spectrum
magX = abs(X);      % magnitude spectrum
...                 % waveform plot as before
nfft = zpf*N;       % FFT size = new frequency grid size
fni = [0:1.0/nfft:1-1.0/nfft]; % normalized freq axis
subplot(3,1,2);
% with interpolation, we can use solid lines '-':
plot(fni,magX,'-k'); grid on; 
...
spec = 20*log10(magX); % spectral magnitude in dB
% clip below at -40 dB:
spec = max(spec,-40*ones(1,length(spec))); 
...                 % plot as before

Figure 8.4: Zero-padded sinusoid at frequency $ f=0.25+0.5/N$ cycles/sample. a) Time waveform. b) Magnitude spectrum. c) DB magnitude spectrum.

Figure 8.4 shows the zero-padded data (top) and corresponding interpolated spectrum on linear and dB scales (middle and bottom, respectively). We now see that the spectrum has a regular sidelobe structure. On the dB scale in Fig.8.4c, negative values are now visible. In fact, it was desirable to clip them at $ -40$ dB to prevent deep nulls from dominating the display by pushing the negative vertical axis limit to $ -300$ dB or more, as in Fig.8.1c. This example shows the importance of using zero padding to interpolate spectral displays so that the untrained eye will "fill in'" properly between the spectral samples.

zpf = 8            # zero-padding factor
n = np.arange(zpf*N)
x = np.concatenate([cos(2*pi*n[:N]*f*T), np.zeros((zpf-1)*N)]) # zero-padded 
X = fft(x)         # interpolated spectrum
magX = abs(X)      # magnitude spectrum

plt.figure(figsize=(10,10))

# Plot time data:
plt.subplot(3,1,1)
plt.plot(n,x,'-k')        
plt.xlim(0,N*zpf)
plt.title('Zero-paddes Sampled Sinnusoid')
plt.xlabel('Time (samples)') 
plt.ylabel('Amplitude')
plt.text(-.11*N*zpf,1,'a)')

# Plot spectral magnitude:
magX = abs(X)
nfft = zpf*N       # FFT size = new frequency grid size
fni = np.arange(0, 1, 1.0/nfft) # normalized freq axis
plt.subplot(3,1,2)
plt.plot(fni,magX,'-k')
plt.grid()
plt.xlim(0,1)
plt.ylim(0, 40)
plt.xlabel('Normalized Frequency (cycles per sample))') 
plt.ylabel('Magnitude (Linear)')
plt.text(-.11,40,'b)')

# Same thing on a dB scale:
spec = 20*log10(magX) # Spectral magnitude in dB
plt.subplot(3,1,3)
plt.plot(fni,spec,'-k')
plt.grid()
plt.xlim(0,1)
plt.ylim(-40, 40)
plt.xlabel('Normalized Frequency (cycles per sample))') 
plt.ylabel('Magnitude (dB)')
plt.text(-.11,40,'c)')

plt.show()

Use of a Blackman Window ¶

As Fig.8.4a suggests, the previous example can be interpreted as using a rectangular window to select a finite segment (of length $ N$ ) from a sampled sinusoid that continues for all time. In practical spectrum analysis, such excerpts are normally analyzed using a window that is tapered more gracefully to zero on the left and right. In this section, we will look at using a Blackman window on our example sinusoid. The Blackman window has good (though suboptimal) characteristics for audio work.

In Octave or the Matlab Signal Processing Toolbox, a Blackman window of length $ M=64$ can be designed very easily:

M = 64;
w = blackman(M);

Many other standard windows are defined as well, including hamming, hanning, and bartlett windows.

In Matlab without the Signal Processing Toolbox, the Blackman window is readily computed from its mathematical definition:

w = .42 - .5*cos(2*pi*(0:M-1)/(M-1)) ...
       + .08*cos(4*pi*(0:M-1)/(M-1));

Figure 8.5 shows the Blackman window and its magnitude spectrum on a dB scale. Fig.8.5c uses the more ``physical'' frequency axis in which the upper half of the FFT bin numbers are interpreted as negative frequencies. Here is the complete Matlab script for Fig.8.5:

M = 64;  
w = blackman(M);
figure(1); 
subplot(3,1,1); plot(w,'*'); title('Blackman Window');
xlabel('Time (samples)'); ylabel('Amplitude'); text(-8,1,'a)'); 

% Also show the window transform:
zpf = 8;                      % zero-padding factor
xw = [w',zeros(1,(zpf-1)*M)]; % zero-padded window
Xw = fft(xw);                 % Blackman window transform
spec = 20*log10(abs(Xw));     % Spectral magnitude in dB
spec = spec - max(spec);      % Normalize to 0 db max
nfft = zpf*M;
spec = max(spec,-100*ones(1,nfft)); % clip to -100 dB
fni = [0:1.0/nfft:1-1.0/nfft];   % Normalized frequency axis
subplot(3,1,2); plot(fni,spec,'-'); axis([0,1,-100,10]); 
xlabel('Normalized Frequency (cycles per sample))'); 
ylabel('Magnitude (dB)'); grid; text(-.12,20,'b)'); 

% Replot interpreting upper bin numbers as frequencies<0:
nh = nfft/2;
specnf = [spec(nh+1:nfft),spec(1:nh)];  % see fftshift()
fninf = fni - 0.5;
subplot(3,1,3);
plot(fninf,specnf,'-'); axis([-0.5,0.5,-100,10]); grid;
xlabel('Normalized Frequency (cycles per sample))'); 
ylabel('Magnitude (dB)');
text(-.62,20,'c)'); 
cmd = ['print -deps ', '../eps/blackman.eps'];
disp(cmd); eval(cmd);
disp 'pausing for RETURN (check the plot). . .'; pause

Figure 8.5: The Blackman window: a) window itself in the time domain, b) dB magnitude spectrum plotted over normalized frequencies $ [0,1)$ , and c) same thing plotted over $ [-0.5,0.5)$ .

M = 64  
w = np.blackman(M)

plt.figure(figsize=(10,10))

plt.subplot(3,1,1) 
plt.plot(w,'-k')
plt.plot(w,'o') 
plt.title('Blackman Window')
plt.xlabel('Time (samples)') 
plt.ylabel('Amplitude') 
plt.text(-8,1,'a)') 

# Also show the window transform:
zpf = 8                      # zero-padding factor
xw = np.concatenate([w,np.zeros((zpf-1)*M)] )# zero-padded window
Xw = fft(xw)                 # Blackman window transform
spec = 20*log10(abs(Xw))     # Spectral magnitude in dB
spec = spec - np.max(spec)      # Normalize to 0 db max
nfft = zpf*M
spec = np.maximum(spec,-100*np.ones(nfft)) # clip to -100 dB
fni = np.arange(0, 1, 1/nfft)   # Normalized frequency axis
plt.subplot(3,1,2) 
plt.plot(fni,spec,'-') 
plt.axis([0,1,-100,10]) 
plt.xlabel('Normalized Frequency (cycles per sample))') 
plt.ylabel('Magnitude (dB)') 
plt.grid() 
plt.text(-.12,20,'b)') 

# Replot interpreting upper bin numbers as frequencies<0:
nh = nfft//2
specnf = np.concatenate([spec[nh:nfft],spec[:nh]])  # see fftshift()
fninf = fni - 0.5
plt.subplot(3,1,3)
plt.plot(fninf,specnf,'-') 
plt.axis([-0.5,0.5,-100,10]) 
plt.grid()
plt.xlabel('Normalized Frequency (cycles per sample))') 
plt.ylabel('Magnitude (dB)')
plt.text(-.62,20,'c)')

plt.show()

Applying the Blackman Window¶

Now let's apply the Blackman window to the sampled sinusoid and look at the effect on the spectrum analysis:

% Windowed, zero-padded data:
n = [0:M-1];          % discrete time axis
f = 0.25 + 0.5/M;     % frequency
xw = [w .* cos(2*pi*n*f),zeros(1,(zpf-1)*M)]; 

% Smoothed, interpolated spectrum:
X = fft(xw);                                    

% Plot time data:
subplot(2,1,1); 
plot(xw);
title('Windowed, Zero-Padded, Sampled Sinusoid');
xlabel('Time (samples)'); 
ylabel('Amplitude');
text(-50,1,'a)'); 

% Plot spectral magnitude:
spec = 10*log10(conj(X).*X);  % Spectral magnitude in dB
spec = max(spec,-60*ones(1,nfft)); % clip to -60 dB
subplot(2,1,2);
plot(fninf,fftshift(spec),'-'); 
axis([-0.5,0.5,-60,40]); 
title('Smoothed, Interpolated, Spectral Magnitude (dB)'); 
xlabel('Normalized Frequency (cycles per sample))'); 
ylabel('Magnitude (dB)'); grid;
text(-.6,40,'b)');

Figure 8.6 plots the zero-padded, Blackman-windowed sinusoid, along with its magnitude spectrum on a dB scale. Note that the first sidelobe (near $ -40$ dB) is nearly 60 dB below the spectral peak (near $ +20$ dB). This is why the Blackman window is considered adequate for many audio applications. From the dual of the convolution theorem discussed in §7.4.6, we know that windowing in the time domain corresponds to smoothing in the frequency domain. Specifically, the complex spectrum with magnitude displayed in Fig.8.4b has been convolved with the Blackman window transform (dB magnitude shown in Fig.8.5c). Thus, the Blackman window Fourier transform has been applied as a smoothing kernel to the Fourier transform of the rectangularly windowed sinusoid to produce the smoothed result in Fig.8.6b. This topic is pursued in detail at the outset of Book IV in the music signal processing series. Figure 8.6: Effect of the Blackman window on the sinusoidal data.

# Windowed, zero-padded data:
n = np.arange(M)          # discrete time axis
f = 0.25 + 0.5/M     # frequency
xw = np.concatenate([w * cos(2*pi*n*f), np.zeros((zpf-1)*M)] )

# Smoothed, interpolated spectrum:
X = fft(xw)                                    

plt.figure(figsize=(10,10))

# Plot time data:
plt.subplot(2,1,1) 
plt.plot(xw)
plt.title('Windowed, Zero-Padded, Sampled Sinusoid')
plt.xlabel('Time (samples)') 
plt.ylabel('Amplitude')
plt.text(-80,1,'a)') 

# Plot spectral magnitude:
spec = 10*np.log10(X.conjugate() * X).real  # Spectral magnitude in dB
spec = np.maximum(spec,-60 * np.ones(nfft)) # clip to -60 dB
plt.subplot(2,1,2)
plt.plot(fninf,np.fft.fftshift(spec),'-') 
plt.axis([-0.5,0.5,-60,40]) 
plt.title('Smoothed, Interpolated, Spectral Magnitude (dB)') 
plt.xlabel('Normalized Frequency (cycles per sample))') 
plt.ylabel('Magnitude (dB)') 
plt.grid()
plt.text(-.6,40,'b)')

plt.show()

Hann-Windowed Complex Sinusoid ¶

In this example, we'll perform spectrum analysis on a complex sinusoid having only a single positive frequency. We'll use the Hann window (also known as the Hanning window) which does not have as much sidelobe suppression as the Blackman window, but its main lobe is narrower. Its sidelobes "roll off" very quickly versus frequency. Compare with the Blackman window results to see if you can see these differences.

The Matlab script for synthesizing and plotting the Hann-windowed sinusoid is given below:

% Analysis parameters:
M = 31;         % Window length 
N = 64;         % FFT length (zero padding factor near 2)

% Signal parameters:
wxT = 2*pi/4;   % Sinusoid frequency (rad/sample)
A = 1;          % Sinusoid amplitude
phix = 0;       % Sinusoid phase

% Compute the signal x:
n = [0:N-1];    % time indices for sinusoid and FFT
x = A * exp(j*wxT*n+phix); % complex sine [1,j,-1,-j...]

% Compute Hann window:
nm = [0:M-1];   % time indices for window computation
% Hann window = "raised cosine", normalization (1/M)
% chosen to give spectral peak magnitude at 1/2:
w = (1/M) * (cos((pi/M)*(nm-(M-1)/2))).^2;  

wzp = [w,zeros(1,N-M)]; % zero-pad out to the length of x
xw = x .* wzp;          % apply the window w to signal x

figure(1);
subplot(1,1,1);

% Display real part of windowed signal and Hann window
plot(n,wzp,'-k'); hold on; plot(n,real(xw),'*k'); hold off;
title(['Hann Window and Windowed, Zero-Padded, ',...
       'Sinusoid (Real Part)']); 
xlabel('Time (samples)'); ylabel('Amplitude');

The resulting plot of the Hann window and its use on sinusoidal data are shown in Fig.8.7.

Figure 8.7: A length 31 Hann window ("raised cosine") overlaid with the real part of the Hann-windowed complex sinusoid. Zero-padding is also shown. The sampled sinusoid is plotted using '*' with no connecting interpolation lines. You must now imagine the continuous real sinusoid (windowed) threading through the asterisks.

# Analysis parameters:
M = 31         # Window length 
N = 64         # FFT length (zero padding factor near 2)

# Signal parameters:
wxT = 2*pi/4   # Sinusoid frequency (rad/sample)
A = 1          # Sinusoid amplitude
phix = 0       # Sinusoid phase

# Compute the signal x:
n = np.arange(N)    # time indices for sinusoid and FFT
x = A * np.exp(1j*wxT*n+phix) # complex sine [1,j,-1,-j...]

# Compute Hann window:
nm = np.arange(M)   # time indices for window computation
# Hann window = "raised cosine", normalization (1/M)
# chosen to give spectral peak magnitude at 1/2:
w = (1/M) * (cos((pi/M)*(nm-(M-1)/2)))**2  

wzp = np.concatenate([w,np.zeros(N-M)]) # zero-pad out to the length of x
xw = x * wzp          # apply the window w to signal x

plt.figure(figsize=(10,10))
plt.subplot(1,1,1)

# Display real part of windowed signal and Hann window
plt.plot(n,wzp,'-k') 
plt.plot(n,xw.real,'ob') 
plt.title('Hann Window and Windowed, Zero-Padded, Sinusoid (Real Part)') 
plt.xlabel('Time (samples)') 
plt.ylabel('Amplitude')

plt.show()

Hann Window Spectrum Analysis Results¶

Finally, the Matlab for computing the DFT of the Hann-windowed complex sinusoid and plotting the results is listed below. To help see the full spectrum, we also compute a heavily interpolated spectrum (via zero padding as before) which we'll draw using solid lines.

% Compute the spectrum and its alternative forms:
Xw = fft(xw);              % FFT of windowed data
fn = [0:1.0/N:1-1.0/N];    % Normalized frequency axis
spec = 20*log10(abs(Xw));  % Spectral magnitude in dB
% Since the nulls can go to minus infinity, clip at -100 dB:
spec = max(spec,-100*ones(1,length(spec)));
phs = angle(Xw);           % Spectral phase in radians
phsu = unwrap(phs);        % Unwrapped spectral phase

% Compute heavily interpolated versions for comparison:
Nzp = 16;                   % Zero-padding factor
Nfft = N*Nzp;               % Increased FFT size
xwi = [xw,zeros(1,Nfft-N)]; % New zero-padded FFT buffer
Xwi = fft(xwi);             % Compute interpolated spectrum
fni = [0:1.0/Nfft:1.0-1.0/Nfft]; % Normalized freq axis
speci = 20*log10(abs(Xwi)); % Interpolated spec mag (dB)
speci = max(speci,-100*ones(1,length(speci))); % clip
phsi = angle(Xwi);          % Interpolated phase
phsiu = unwrap(phsi);       % Unwrapped interpolated phase

figure(1);
subplot(2,1,1);

plot(fn,abs(Xw),'*k'); hold on; 
plot(fni,abs(Xwi),'-k'); hold off;
title('Spectral Magnitude'); 
xlabel('Normalized Frequency (cycles per sample))'); 
ylabel('Amplitude (linear)');

subplot(2,1,2);

% Same thing on a dB scale
plot(fn,spec,'*k'); hold on; plot(fni,speci,'-k'); hold off;
title('Spectral Magnitude (dB)'); 
xlabel('Normalized Frequency (cycles per sample))'); 
ylabel('Magnitude (dB)');

cmd = ['print -deps ', 'specmag.eps']; disp(cmd); eval(cmd);
disp 'pausing for RETURN (check the plot). . .'; pause

figure(1);
subplot(2,1,1);
plot(fn,phs,'*k'); hold on; plot(fni,phsi,'-k'); hold off;
title('Spectral Phase'); 
xlabel('Normalized Frequency (cycles per sample))'); 
ylabel('Phase (rad)'); grid;
subplot(2,1,2);
plot(fn,phsu,'*k'); hold on; plot(fni,phsiu,'-k'); hold off;
title('Unwrapped Spectral Phase'); 
xlabel('Normalized Frequency (cycles per sample))'); 
ylabel('Phase (rad)'); grid;
cmd = ['print -deps ', 'specphs.eps']; disp(cmd); eval(cmd);

Figure 8.8 shows the spectral magnitude and Fig.8.9 the spectral phase.

Figure 8.8: Spectral magnitude on linear (top) and dB (bottom) scales. There are no negative-frequency components in Fig.8.8 because we are analyzing a complex sinusoid $ x=[1,j,-1,-j,1,j,\ldots\,]$ , which has frequency $ f_s/4$ only, with no component at $ -f_s/4$ .

Notice how difficult it would be to correctly interpret the shape of the "sidelobes" without zero padding. The asterisks correspond to a zero-padding factor of 2, already twice as much as needed to preserve all spectral information faithfully, but not enough to clearly outline the sidelobes in a spectral magnitude plot.

# Compute the spectrum and its alternative forms:
Xw = fft(xw)              # FFT of windowed data
fn = np.arange(0,1, 1/N)   # Normalized frequency axis
spec = 20*np.log10(abs(Xw))  # Spectral magnitude in dB
# Since the nulls can go to minus infinity, clip at -100 dB:
spec = np.maximum(spec,-100*np.ones(len(spec)))
phs = np.angle(Xw)           # Spectral phase in radians
phsu = np.unwrap(phs)        # Unwrapped spectral phase

# Compute heavily interpolated versions for comparison:
Nzp = 16                   # Zero-padding factor
Nfft = N*Nzp               # Increased FFT size
xwi = np.concatenate([xw,np.zeros(Nfft-N)]) # New zero-padded FFT buffer
Xwi = fft(xwi)             # Compute interpolated spectrum
fni = np.arange(0, 1, 1/Nfft) # Normalized freq axis
speci = 20*np.log10(abs(Xwi)) # Interpolated spec mag (dB)
speci = np.maximum(speci,-100*np.ones(len(speci))) # clip
phsi = np.angle(Xwi)          # Interpolated phase
phsiu = np.unwrap(phsi)       # Unwrapped interpolated phase

# plot spectral magnitude
plt.figure(figsize=(10,10))

plt.subplot(2,1,1)
plt.plot(fn,abs(Xw),'ob')  
plt.plot(fni,abs(Xwi),'-k') 
plt.title('Spectral Magnitude') 
plt.xlabel('Normalized Frequency (cycles per sample))') 
plt.ylabel('Amplitude (linear)')

plt.subplot(2,1,2)
# Same thing on a dB scale
plt.plot(fn,spec,'ob') 
plt.plot(fni,speci,'-k') 
plt.title('Spectral Magnitude (dB)') 
plt.xlabel('Normalized Frequency (cycles per sample))') 
plt.ylabel('Magnitude (dB)')
plt.show()

Spectral Phase¶

As for the phase of the spectrum, what do we expect? We have chosen the sinusoid phase offset to be zero. The window is causal and symmetric about its middle. Therefore, we expect a linear phase term with slope $ -(M-1)/2=-15$ samples (as discussed in connection with the shift theorem in §7.4.4). Also, the window transform has sidelobes which cause a phase of $ \pi $ radians to switch in and out. Thus, we expect to see samples of a straight line (with slope $ -15$ samples) across the main lobe of the window transform, together with a switching offset by $ \pi $ in every other sidelobe away from the main lobe, starting with the immediately adjacent sidelobes.

In Fig.8.9(a), we can see the negatively sloped line across the main lobe of the window transform, but the sidelobes are hard to follow. Even the unwrapped phase in Fig.8.9(b) is not as clear as it could be. This is because a phase jump of $ \pi $ radians and $ -\pi$ radians are equally valid, as is any odd multiple of $ \pi $ radians. In the case of the unwrapped phase, all phase jumps are by $ +\pi$ starting near frequency $ 0.3$ . Figure 8.9(c) shows what could be considered the "canonical" unwrapped phase for this example: We see a linear phase segment across the main lobe as before, and outside the main lobe, we have a continuation of that linear phase across all of the positive sidelobes, and only a $ \pi $ -radian deviation from that linear phase across the negative sidelobes. In other words, we see a straight linear phase at the desired slope interrupted by temporary jumps of $ \pi $ radians. To obtain unwrapped phase of this type, the unwrap function needs to alternate the sign of successive phase-jumps by $ \pi $ radians; this could be implemented, for example, by detecting jumps-by-$ \pi $ to within some numerical tolerance and using a bit of state to enforce alternation of $ +\pi$ with $ -\pi$ .

To convert the expected phase slope from $ -15$ "radians per (rad/sec)" to "radians per cycle-per-sample," we need to multiply by "radians per cycle," or $ 2\pi $ . Thus, in Fig.8.9(c), we expect a slope of $ -94.2$ radians per unit normalized frequency, or $ -9.42$ radians per $ 0.1$ cycles-per-sample, and this looks about right, judging from the plot.

Figure 8.9: Spectral phase and two different phase unwrappings.

# plot spectral phase
plt.figure(figsize=(10,10))

plt.subplot(2,1,1)
plt.plot(fn,phs,'*b') 
plt.plot(fni,phsi,'-k') 
plt.title('Spectral Phase') 
plt.xlabel('Normalized Frequency (cycles per sample))') 
plt.ylabel('Phase (rad)') 
plt.grid()

plt.subplot(2,1,2)
plt.plot(fn,phsu,'*b')
plt.plot(fni,phsiu,'-k') 
plt.title('Unwrapped Spectral Phase') 
plt.xlabel('Normalized Frequency (cycles per sample))') 
plt.ylabel('Phase (rad)') 
plt.grid()

plt.show()

# attempt at the "canonical" unwrapped phase (no code given in the book)
phsiuu = phsi - np.pi*np.cumsum(np.diff(np.concatenate([[0],phsi])) > 2)
plt.figure(figsize=(10,5))
#plt.plot(fn,phsuu,'*k')
plt.plot(fni,phsiuu,'-k') 
plt.title('"Canonical" Unwrapped Spectral Phase') 
plt.xlabel('Normalized Frequency (cycles per sample))') 
plt.ylabel('Phase (rad)') 
plt.grid()
plt.show()

Correlation Analysis¶

FIR System Identification ¶

Estimating an impulse response from input-output measurements is called system identification, and a large literature exists on this topic.

Cross-correlation can be used to compute the impulse response $ h(n)$ of a filter from the cross-correlation of its input and output signals $ x(n)$ and $ y = h\ast x$ , respectively. To see this, note that, by the correlation theorem,

$$x\star y \;\longleftrightarrow\;\overline{X}\cdot Y = \overline{X}\cdot (H\cdot X) = H\cdot\left\vert X\right\vert^2. $$

Therefore, the frequency response equals the input-output cross-spectrum divided by the input power spectrum:

$$ H = \frac{\overline{X}\cdot Y}{\left\vert X\right\vert^2} = \frac{{\hat R}_{xy}}{{\hat R}_{xx}} $$

where multiplication and division of spectra are defined pointwise, i.e., $ H(\omega_k) = \overline{X(\omega_k)}\cdot Y(\omega_k)/\vert X(\omega_k)\vert^2$ . A Matlab program illustrating these relationships is listed in Fig.8.13.

% sidex.m - Demonstration of the use of FFT cross-
% correlation to compute the impulse response 
% of a filter given its input and output.
% This is called "FIR system identification".

Nx = 32; % input signal length 
Nh = 10; % filter length 
Ny = Nx+Nh-1; % max output signal length
% FFT size to accommodate cross-correlation:
Nfft = 2^nextpow2(Ny); % FFT wants power of 2

x = rand(1,Nx); % input signal = noise
%x = 1:Nx;  % input signal = ramp
h = [1:Nh];     % the filter
xzp = [x,zeros(1,Nfft-Nx)]; % zero-padded input
yzp = filter(h,1,xzp); % apply the filter
X = fft(xzp);   % input spectrum
Y = fft(yzp);   % output spectrum
Rxx = conj(X) .* X; % energy spectrum of x
Rxy = conj(X) .* Y; % cross-energy spectrum
Hxy = Rxy ./ Rxx;   % should be the freq. response
hxy = ifft(Hxy);    % should be the imp. response

hxy(1:Nh)       % print estimated impulse response
freqz(hxy,1,Nfft);  % plot estimated freq response

err = norm(hxy - [h,zeros(1,Nfft-Nh)])/norm(h);
disp(sprintf(['Impulse Response Error = ',...
    '%0.14f%%'],100*err));

err = norm(Hxy-fft([h,zeros(1,Nfft-Nh)]))/norm(h);
disp(sprintf(['Frequency Response Error = ',...
    '%0.14f%%'],100*err));

Figure 8.13: FIR system identification example in matlab.

import numpy as np
import matplotlib.pyplot as plt
from numpy.fft    import fft, ifft
from numpy.linalg import norm
from scipy.signal import lfilter as filter
from scipy.signal import freqz


def nextpow2(x):
    """ Return N, the first power of 2 such that 2**N >= |x|"""
    return int(np.ceil(np.log2(abs(x))))

# sidex.m - Demonstration of the use of FFT cross-
# correlation to compute the impulse response 
# of a filter given its input and output.
# This is called "FIR system identification".

Nx = 32 # input signal length 
Nh = 10 # filter length 
Ny = Nx+Nh-1 # max output signal length
# FFT size to accommodate cross-correlation:
Nfft = 2 ** nextpow2(Ny) # FFT wants power of 2

x = np.random.rand(Nx) # input signal = noise
#x = 1 + np.arange(Nx)   # input signal = ramp
h = 1 + np.arange(Nh)    # the filter
xzp = np.concatenate([x,np.zeros(Nfft-Nx)]) # zero-padded input
yzp = filter(h,1,xzp) # apply the filter
X = fft(xzp)   # input spectrum
Y = fft(yzp)   # output spectrum
Rxx = X.conjugate() * X # energy spectrum of x
Rxy = X.conjugate() * Y # cross-energy spectrum
Hxy = Rxy / Rxx   # should be the freq. response
hxy = ifft(Hxy).real    # should be the imp. response

# show the estimated impulse response
print("Impulse Response:")
print("\tActual Filter:",h[:Nh])         # print estimated impulse response
print("\tEstimated Filter:",hxy[:Nh])         # print estimated impulse response

# plot the estimated frequency response
print("\nFrequency Response:")
plt.figure(figsize=(10,5))
f,hz = freqz(h,1,Nfft)  # get the actual freq response
plt.plot(f/np.pi,abs(hz), linewidth = 3, label = "Actual Filter")
f,hz = freqz(hxy,1,Nfft)  # get the estimated freq response
plt.plot(f/np.pi,abs(hz), linestyle = "dashed",  linewidth = 3, label = "Estimated Filter")
plt.legend()
plt.xlim(0,1)
plt.ylim(0, 60)
plt.title("Example of FIR System Identification")
plt.xlabel("Normalized Frequency in units of $\pi$")
plt.ylabel("Frequency Response of the FIR Filter")
plt.grid()
plt.show()

# get the errors in time and frequency domains:
err = norm(hxy - np.concatenate([h,np.zeros(Nfft-Nh)]))/norm(h)
print('Impulse Response Error = '+ str(100*err))
err = norm(Hxy-fft(np.concatenate([h,np.zeros(Nfft-Nh)])))/norm(h)
print('Frequency Response Error = ' + str(100*err))

Impulse Response:
	Actual Filter: [ 1  2  3  4  5  6  7  8  9 10]
	Estimated Filter: [ 1.  2.  3.  4.  5.  6.  7.  8.  9. 10.]

Frequency Response:

Impulse Response Error = 1.0156663341709242e-13
Frequency Response Error = 8.17896624295034e-13

Coherence Function ¶

A function related to cross-correlation is the coherence function, defined in terms of power spectral densities and the cross-spectral density by

$$ C_{xy}(\omega) \doteq \frac{\vert R_{xy}(\omega)\vert^2}{R_x(\omega)R_y(\omega)}. $$

In practice, these quantities can be estimated by time-averaging $ \overline{X(\omega_k)}Y(\omega_k)$ , $ \left\vert X(\omega_k)\right\vert^2$ , and $ \left\vert Y(\omega_k)\right\vert^2$ over successive signal blocks. Let $ \{\cdot\}_m$ denote time averaging across frames as in Eq.(8.3) above. Then an estimate of the coherence, the sample coherence function $ {\hat C}_{xy}(\omega_k)$ , may be defined by

$$ {\hat C}_{xy}(\omega_k) \doteq \frac{\left\vert\left\{\overline{X_m(\omega_k)}Y_m(\omega_k)\right\}_m\right\vert^2}{\left\{\left\vert X_m(\omega_k)\right\vert^2\right\}_m\cdot\left\{\left\vert Y_m(\omega_k)\right\vert^2\right\}_m}. $$

Note that the averaging in the numerator occurs before the absolute value is taken.

The coherence $ C_{xy}(\omega)$ is a real function between zero and one which gives a measure of correlation between $ x$ and $ y$ at each frequency $ \omega$ . For example, imagine that $ y$ is produced from $ x$ via an LTI filtering operation:

$$y = h\ast x \;\implies\; Y(\omega_k) = H(\omega_k)X(\omega_k) $$

Then the magnitude-normalized cross-spectrum in each frame is

\begin{eqnarray*} {\hat A}_{x_m y_m}(\omega_k) &\doteq & \frac{\overline{X_m(\omega_k)}Y_m(\omega_k)}{\left\vert X_m(\omega_k)\right\vert\cdot\left\vert Y_m(\omega_k)\right\vert} = \frac{\overline{X_m(\omega_k)}H(\omega_k)X_m(\omega_k)}{\left\vert X_m(\omega_k)\right\vert\cdot\left\vert H(\omega_k)X_m(\omega_k)\right\vert}\\ [5pt] &=& \frac{\left\vert X_m(\omega_k)\right\vert^2 H(\omega_k)}{\left\vert X_m(\omega_k)\right\vert^2\left\vert H(\omega_k)\right\vert} = \frac{H(\omega_k)}{\left\vert H(\omega_k)\right\vert} \end{eqnarray*}

so that the coherence function becomes

$$\left\vert{\hat C}_{xy}(\omega_k)\right\vert^2 = \left\vert\frac{H(\omega_k)}{\left\vert H(\omega_k)\right\vert}\right\vert^2 = 1. $$

On the other hand, when $ x$ and $ y$ are uncorrelated (e.g., $ y$ is a noise process not derived from $ x$ ), the sample coherence converges to zero at all frequencies, as the number of blocks in the average goes to infinity.

A common use for the coherence function is in the validation of input/output data collected in an acoustics experiment for purposes of system identification. For example, $ x(n)$ might be a known signal which is input to an unknown system, such as a reverberant room, say, and $ y(n)$ is the recorded response of the room. Ideally, the coherence should be $ 1$ at all frequencies. However, if the microphone is situated at a null in the room response for some frequency, it may record mostly noise at that frequency. This is indicated in the measured coherence by a significant dip below 1. An example is shown in Book III for the case of a measured guitar-bridge admittance. A more elementary example is given in the next section.

Coherence Function in Matlab¶

In Matlab and Octave, cohere(x,y,M) computes the coherence function $ C_{xy}$ using successive DFTs of length $ M$ with a Hanning window and 50% overlap. (The window and overlap can be controlled via additional optional arguments.) The matlab listing in Fig.8.14 illustrates cohere on a simple example. Figure 8.15 shows a plot of cxyM for this example. We see a coherence peak at frequency $ 0.25$ cycles/sample, as expected, but there are also two rather large coherence samples on either side of the main peak. These are expected as well, since the true cross-spectrum for this case is a critically sampled Hanning window transform. (A window transform is critically sampled whenever the window length equals the DFT length.)

Figure 8.14: Coherence measurement example in matlab.

% Illustrate estimation of coherence function 'cohere' 
% in the Matlab Signal Processing Toolbox
% or Octave with Octave Forge:
N = 1024;           % number of samples
x=randn(1,N);       % Gaussian noise
y=randn(1,N);       % Uncorrelated noise
f0 = 1/4;           % Frequency of high coherence
nT = [0:N-1];       % Time axis
w0 = 2*pi*f0;
x = x + cos(w0*nT); % Let something be correlated
p = 2*pi*rand(1,1); % Phase is irrelevant
y = y + cos(w0*nT+p); 
M = round(sqrt(N)); % Typical window length
[cxyM,w] = cohere(x,y,M); % Do the work
figure(1); clf; 
stem(w/2,cxyM,'*'); % w goes from 0 to 1 (odd convention)
legend('');         % needed in Octave
grid on;
ylabel('Coherence');
xlabel('Normalized Frequency (cycles/sample)');
axis([0 1/2 0 1]);
replot;  % Needed in Octave
saveplot('../eps/coherex.eps'); % compatibility utility

Figure 8.15: Sample coherence function.

Note that more than one frame must be averaged to obtain a coherence less than one. For example, changing the cohere call in the above example to cxyN = cohere(x,y,N); produces all ones in cxyN, because no averaging is performed.

# Illustrate estimation of coherence function 'cohere' 
# in the Matlab Signal Processing Toolbox
# or Octave with Octave Forge:
from numpy import pi, cos, sqrt
from numpy.random import randn, rand
from scipy.signal import coherence

N = 1024           # number of samples
x=randn(N)       # Gaussian noise
y=randn(N)       # Uncorrelated noise
f0 = 1/4           # Frequency of high coherence
nT = np.arange(N)  # Time axis
w0 = 2*pi*f0
x = x + cos(w0*nT) # Let something be correlated
p = 2*pi*rand(1) # Phase is irrelevant
y = y + cos(w0*nT+p) 
M = round(sqrt(N)) # Typical window length
w, cxyM = coherence(x,y,nperseg = M) # Do the work

# plot
plt.figure(figsize=(10,5))
plt.plot(w, cxyM, 'ob')
plt.stem(w, cxyM,'-k', use_line_collection = True) # w goes from 0 to 1 (odd convention)
plt.grid()
plt.ylabel('Coherence')
plt.xlabel('Normalized Frequency (cycles/sample)')
plt.axis([0, 1/2, 0, 1])
plt.show()

Python code for MATHEMATICS OF THE DISCRETE FOURIER TRANSFORM (DFT) WITH AUDIO APPLICATIONS

Python Code by¶

Example Applications of the DFT¶

Spectrum Analysis of a Sinusoid: Windowing, Zero-Padding, and FFT¶

FFT of a Simple Sinusoid¶