Read
samples of the input signal
into a local buffer of
length
which is initially zeroed
We call
the
th frame of the input signal, and
the
th time normalized input frame
(time-normalized by translating it to time zero). The frame length is
, which we assume to be odd for reasons to be
discussed later. The time advance
(in samples) from one frame to
the next is called the
hop size
or
step size.
Multiply the data frame pointwise by a length spectrum
analysis window
to obtain the
th
windowed data frame (time normalized):
Extend
with zeros on both sides to obtain a
zero-padded frame:
(8.5)
where
is chosen to be a power of two larger than
. The number
is the
zero-padding factor.
As discussed in §2.5.3,
the zero-padding factor is the interpolation factor for the
spectrum, i.e., each FFT bin is replaced by
bins, interpolating
the spectrum using ideal bandlimited interpolation [264], where
the ``band'' in this case is the
-sample nonzero duration of
in the time domain.
Take a length
FFT of
to obtain the time-normalized,
frequency-sampledSTFT at time
:
If needed, time normalization may be removed using a
linear phase term to yield the sampled STFT:
(8.7)
The (continuous-frequency) STFT may be approached arbitrarily closely
by using more zero padding and/or other interpolation methods.
Note that there is no irreversible time-aliasing when the STFT
frequency axis
is sampled to the points
, provided
the FFT size
is greater than or equal to the window length
.