- Read
samples of the input signal
into a local buffer of
length
which is initially zeroed
*frame*of the input signal, and the th*time normalized*input frame (time-normalized by translating it to time zero). The frame length is , which we assume to be*odd*for reasons to be discussed later. The time advance (in samples) from one frame to the next is called the*hop size*or*step size*. - Multiply the data frame pointwise by a length
spectrum
analysis window
to obtain the
th
*windowed*data frame (time normalized): - Extend
with zeros on both sides to obtain a
*zero-padded*frame:(8.5)

where is chosen to be a power of two larger than . The number is the*zero-padding factor*. As discussed in §2.5.3, the zero-padding factor is the*interpolation factor*for the spectrum,*i.e.*, each FFT bin is replaced by bins, interpolating the spectrum using ideal bandlimited interpolation [264], where the ``band'' in this case is the -sample nonzero duration of in the time domain. - Take a length
FFT of
to obtain the time-normalized,
*frequency-sampled*STFT at time :(8.6)

where , and is the sampling rate in Hz. As in any FFT, we call the*bin number*. - If needed, time normalization may be removed using a
linear phase term to yield the sampled STFT:
(8.7)

The (continuous-frequency) STFT may be approached arbitrarily closely by using more zero padding and/or other interpolation methods.Note that there is no irreversible time-aliasing when the STFT frequency axis is sampled to the points , provided the FFT size is greater than or equal to the window length .

[How to cite this work] [Order a printed hardcopy] [Comment on this page via email]

Copyright ©

Center for Computer Research in Music and Acoustics (CCRMA), Stanford University