Audio Representations for Data Compression and Compressed Domain Processing

Scott N. Levine

Ph.D. Dissertation

December 2, 1998


Scott Levine's doctoral thesis is presented here in PostScript and Adobe Acrobat (.pdf) format. The PostScript version has been compressed using Gzip. The PostScript files can be viewed for example with Ghostscript. The Adobe Acrobat viewer is available online.



Abstract

In the world of digital audio processing, one usually has the choice of performing modifications on the raw audio signal, or data compressing the audio signal. But, performing modifications on a data compressed audio signal has proved difficult in the past. This thesis provides a new representation of audio signals that allows for both very low bit rate audio data compression and high quality compressed domain processing and modifications. In this context, processing possibilities are time-scale and pitch-scale modifications.

This new audio representation segments the audio into separate sinusoidal, transients, and noise signals. During determined attack transients regions, the audio is modeled by well established transform coding techniques. During the remaining non-transient regions of the input, the audio is modeled by a mixture of multiresolution sinusoidal modeling and noise modeling. Careful phase locking techniques at the time boundaries between the sines and transients allow for seamless transitions between representations. By separating the audio into three individual representations, each can be efficiently and perceptually quantized.



Sound Examples

Following are some sound (*.wav) examples referred to in Appendix A of the thesis. In this set of sound examples, the sines+transients+noise compression scheme at 32 kbps/ch described in the thesis is compared to MPEG-AAC also at 32 kbps/ch. The MPEG-AAC examples were encoded using source code from FhG during October 1998. The individual sines, transients, and noise components from the previous sound file are shown separately.
The next set of sound examples shows another comparison of between the thesis, sines+transients+noise system versus MPEG-AAC, but this time a pop piece is used instead. The individual sines, transients, and noise components from the previous sound file are shown separately.
The next several examples show the ability to perform pitch and time-scale modifications in the compressed domain. That is, while the audio is being decoded, it is also being time and/or pitch scaled. There is no need for external post-processing modification algorithms. All the following examples use the pop example, It Takes Two. The first two examples show the sound quality difference in slowing down music using the quantized sines+transients+noise system in this thesis versus the quality from commerically available software, Cool Edit.
This URL: http://www-ccrma.stanford.edu/~scottl/thesis.html
Last modified:
December 3, 1998
Author: Scott Levine