Audio Representations for Data Compression and Compressed Domain Processing
December 2, 1998
Scott Levine's doctoral thesis is presented here in
PostScript and Adobe Acrobat (.pdf) format. The PostScript version has been compressed
using Gzip.
The PostScript files can be viewed for example with Ghostscript. The
Adobe
Acrobat viewer is available online.
- Thesis
(compressed PostScript; 1523 kB)
- Thesis
(Adobe Acrobat; 3050 kB)
Abstract
In the world of digital audio processing, one usually has
the choice of performing modifications on the raw audio
signal, or data compressing the audio signal. But,
performing modifications on a data compressed audio signal
has proved difficult in the past. This thesis provides a
new representation of audio signals that allows for both
very low bit rate audio data compression and high quality
compressed domain processing and modifications. In this
context, processing possibilities are time-scale and
pitch-scale modifications.
This new audio representation segments the audio into
separate sinusoidal, transients, and noise signals. During
determined attack transients regions, the audio is modeled
by well established transform coding techniques. During the
remaining non-transient regions of the input, the audio is
modeled by a mixture of multiresolution sinusoidal modeling
and noise modeling. Careful phase locking techniques at the
time boundaries between the sines and transients allow for
seamless transitions between representations. By separating
the audio into three individual representations, each can be
efficiently and perceptually quantized.
Sound Examples
Following are some sound (*.wav) examples referred to in
Appendix A of the thesis. In this set of sound examples,
the sines+transients+noise compression scheme at 32 kbps/ch
described in the thesis is compared to MPEG-AAC also at 32
kbps/ch. The MPEG-AAC examples were encoded using source
code from FhG
during October 1998.
The individual sines, transients, and noise components from the previous sound file are shown separately.
The next set of sound examples shows another comparison of between the thesis, sines+transients+noise system versus MPEG-AAC, but this time a pop piece is used instead.
The individual sines, transients, and noise components from the previous sound file are shown separately.
The next several examples show the ability to perform pitch
and time-scale modifications in the compressed domain. That
is, while the audio is being decoded, it is also being time
and/or pitch scaled. There is no need for external
post-processing modification algorithms. All the following
examples use the pop example, It Takes Two. The first two
examples show the sound quality difference in slowing down
music using the quantized sines+transients+noise system in
this thesis versus the quality from commerically available
software, Cool
Edit.
-
It Takes Two, time-scale modified two times slower using sines+transients+noise
-
It Takes Two, time-scale modified two times slower using commercially
available CoolEdit
-
It Takes Two, time-scale modified using sines+transients+noise, looped
using various time-scaling rates, from {2.0, 1.6, 1.2, 1.0, 0.8, 0.6, 0.5}
-
It Takes Two, pitch-scale modified using sines+transients+noise, looped
using various pitch-scaling rates, from {0.89, 0.94, 1.00, 1.06, 1.12}
This
URL: http://www-ccrma.stanford.edu/~scottl/thesis.html
Last modified: December 3, 1998
Author: Scott Levine