Once we have our data in the form of amplitude and frequency envelopes for each filter-bank channel, we can compress them by a large factor. If there are channels, we nominally expect to be able to downsample by a factor of , as discussed initially in Chapter 9 and more extensively in Chapter 11.

In early computer music [97,186], amplitude and
frequency envelopes were ``downsampled'' by means of *piecewise
linear approximation*. That is, a set of *breakpoints* were
defined in time between which linear segments were used. These
breakpoints correspond to ``knot points'' in the context of polynomial
spline interpolation [286]. Piecewise linear approximation
yielded large compression ratios for relatively steady tonal
signals.^{G.10}For example, compression ratios of 100:1 were not uncommon for
isolated ``toots'' on tonal orchestral instruments [97].

A more straightforward method is to simply downsample each envelope by
some factor. Since each subband is bandlimited to the channel
bandwidth, we expect a downsampling factor on the order of the number
of channels in the filter bank. Using a hop size
in the STFT
results in downsampling by the factor
(as discussed
in §9.8). If
channels are downsampled by
, then the
total number of samples coming out of the filter bank equals the
number of samples going into the filter bank. This may be called
*critical downsampling*, which is invariably used in filter banks
for *audio compression*, as discussed further in Chapter 11. A benefit
of converting a signal to critically sampled filter-bank form is that
bits can be allocated based on the amount of energy in each subband
relative to the psychoacoustic masking threshold in that band.
Bit-allocation is typically different for tonal and noise signals in a
band [113,25,16].

[How to cite this work] [Order a printed hardcopy] [Comment on this page via email]

Copyright ©

Center for Computer Research in Music and Acoustics (CCRMA), Stanford University