Next |
Prev |
Up |
Top
|
JOS Index |
JOS Pubs |
JOS Home |
Search
Tony Verma
verma at furthur.stanford
(EE)
This week i'll be giving a first run of my orals talk. As such
comments, suggestions, and questions will be greatly appreciated.
A Perceptually Based Audio Signal Model with Application to Scalable
Compression
Abstract (which is a bit long...)
Audio delivery in network environments such as the Internet where
bandwidth is not guaranteed, packet loss is common and where users
connect to the network at various data rates demands scalable
compression techniques. Scalability allows each user to receive the
best possible audio quality given the current network condition. In
addition, because the separation principle for source and channel
coding does not apply to lossy packet networks, an audio source coding
technique that explicitly considers channel characteristics is
desirable. These goals can be achieved by using a higher level
description for audio than the actual waveform. This talk will focus
on a method for extracting meaningful parameters from general digital
audio signals that takes into account the way humans perceive sound;
moreover, application of this parametric model to scalable audio
compression will be discussed.
The model consists of three major components: sines, transients and
noise. These underlying signal components are found during the
analysis stage of the model. Quantizing and compressing the resulting
model parameters allows for efficient storage and transmission of the
original audio signal. The talk will cover enhancements made to
current sine models. These enhancements allow explicit perceptual
information to be included in the sinusoidal model. In addition, a
novel transient modeling technique will be covered.
The three part model provides an efficient, flexible and perceptually
accurate representation for audio signals. It therefore is
appropriate for scalable compression over lossy packet networks. The
efficiency of the model ensures high compression ratios. Flexibility
simultaneously allows scalability and robustness to channel
characteristics such as packet losses because subsets of model
parameters represent the original signal with varying degrees of
fidelity. Perceptual accuracy ensures that parameter subsets
reasonably represent the original signal while the complete parameter
set represents the original exactly in a perceptual sense.
Most current techniques for audio compression (e.g., MPEG audio layer
3 and AAC, Real Audio's G2, etc.), use a subband decomposition in
conjunction with psychoacoustic models to compress the actual audio
waveform itself. No model of the signal is assumed. These
compression techniques have been very successful for targeted fixed
bit rates; however, they cannot be scaled in large steps without
severe loss in quality. This is evident in the case of Real Audio
where a database will store many versions of an audio signal at
various bitrates (e.g., 92Kbps, 64Kbps, 32Kbps, 20Kbps, and 16Kbps)
and quality. Because using an underlying model allows meaningful
subsets of parameters to describe the original signal, one compressed
bitstream (e.g., 96Kbps) can be stored. Embedded within this bitstream
are lower bitrate versions (e.g., 64Kbps, 32Kbps, 20Kbps, and 16Kbps)
that can be easily extracted. Sound demos of the audio compression
scheme will be played.
Hope to see you there!
-Tony
Next |
Prev |
Up |
Top
|
JOS Index |
JOS Pubs |
JOS Home |
Search
Download mus423h.pdf