Next  |  Prev  |  Up  |  Top  |  Index  |  JOS Index  |  JOS Pubs  |  JOS Home  |  Search

Normalized STFT Basis

The Short Time Fourier Transform (STFT) is defined as a time-ordered sequence of DTFTs, and implemented in practice as a sequence of FFTs (see §7.1). Thus, the signal basis functions are naturally defined as the DFT-sinusoids multiplied by time-shifted windows, suitably normalized for unit $ \ensuremath{L_2}$ norm:

$\displaystyle \varphi_{mk}(n) \isdef \frac{w(n-mR)e^{j\omega_k n}}{\left\Vert\,\hbox{\sc Shift}_{mR}(w) e^{j\omega_k (\cdot)}\,\right\Vert} = \frac{w(n-mR) e^{j\omega_k n}}{\sqrt{\sum_n{w^2(n)}}},$ (12.115)

$\displaystyle \omega_k = \frac{2\pi k}{N}, \quad k \in [0,N-1], \quad n\in (-\infty,\infty),\quad w(n)\in{\cal R},$ (12.116)

and $ N$ is the DFT length.

When successive windows overlap (i.e., the hop size $ R$ is less than the window length $ M$ ), the basis functions are not orgthogonal. In this case, we may say that the basis set is overcomplete.

The basis signals are orthonormal when $ R=M=N$ and the rectangular window is used ($ w=w_R$ ). That is, two rectangularly windowed DFT sinusoids are orthogonal when either the frequency bin-numbers or the time frame-numbers differ, provided that the window length $ M$ equals the number of DFT frequencies $ N$ (no zero padding). In other words, we obtain an orthogonal basis set in the STFT when the hop size, window length, and DFT length are all equal (in which case the rectangular window must be used to retain the perfect-reconstruction property). In this case, we can write

$\displaystyle \varphi_{mk}= \hbox{\sc Shift}_{mN}\left[\hbox{\sc ZeroPad}_\infty\left(\varphi_k ^{\hbox{\tiny DFT}}\right)\right],$ (12.117)


$\displaystyle \varphi_{mk}(n) = \left\{\begin{array}{ll} \frac{e^{j\omega_k n}}{\sqrt{N}}, & mN \leq n \leq (m+1)N-1 \\ [5pt] 0, & \mbox{otherwise.} \\ \end{array} \right.$ (12.118)

The coefficient of projection can be written
$\displaystyle \displaystyle
\left<\varphi_{mk},x\right>$ $\displaystyle =$ $\displaystyle \frac{1}{\sqrt{N}} \sum_{n=-\infty}^{\infty}
x(n) w_R(n-mN) e^{-j\omega_k n}$  
  $\displaystyle \isdef$ $\displaystyle \frac{\hbox{STFT}_{N,m,k}(x)}{\sqrt{N}} \isdefs \frac{X_m(\omega_k )}{\sqrt{N}}$  

so that the signal expansion can be interpreted as
$\displaystyle \displaystyle
x(n)$ $\displaystyle =$ $\displaystyle \sum_{m=-\infty}^{\infty}\sum_{k=0}^{N-1} \left<\varphi_{mk},x\right> \varphi_{mk}(n)$  
  $\displaystyle =$ $\displaystyle \sum_{m=-\infty}^{\infty}
w_R(n-mN)\frac{1}{N}\sum_{k=0}^{N-1} X_m(\omega_k )e^{j\omega_k n}$  
  $\displaystyle =$ $\displaystyle \sum_{m=-\infty}^{\infty}
\hbox{\sc Shift}_{mN,n}\left\{\hbox{\sc ZeroPad}_\infty\left[\hbox{DFT}_N^{-1}(X_m)\right]\right\}$  
  $\displaystyle \isdef$ $\displaystyle \hbox{STFT}_{N,n}^{-1}(X)$  

In the overcomplete case, we get a special case of weighted overlap-add8.6):

$\displaystyle \displaystyle
x(n)$ $\displaystyle =$ $\displaystyle \sum_{m=-\infty}^{\infty}\sum_{k=0}^{N-1} \left<\varphi_{mk},x\right> \varphi_{mk}(n)$  
  $\displaystyle =$ $\displaystyle \sum_{m=-\infty}^{\infty} \frac{1}{N}\sum_{k=0}^{N-1} X_m(\omega_k ) w(n-mN)e^{j\omega_k n}$  

Next  |  Prev  |  Up  |  Top  |  Index  |  JOS Index  |  JOS Pubs  |  JOS Home  |  Search

[How to cite this work]  [Order a printed hardcopy]  [Comment on this page via email]

``Spectral Audio Signal Processing'', by Julius O. Smith III, W3K Publishing, 2011, ISBN 978-0-9745607-3-1.
Copyright © 2022-02-28 by Julius O. Smith III
Center for Computer Research in Music and Acoustics (CCRMA),   Stanford University