Next  |  Prev  |  Up  |  Top  |  JOS Index  |  JOS Pubs  |  JOS Home  |  Search

A Perceptually Based Audio Signal Model with Application to Scalable Compression

Tony Verma $<$verma at furthur.stanford$>$ (EE)

This week i'll be giving a first run of my orals talk. As such comments, suggestions, and questions will be greatly appreciated.

A Perceptually Based Audio Signal Model with Application to Scalable Compression

Abstract (which is a bit long...)

Audio delivery in network environments such as the Internet where bandwidth is not guaranteed, packet loss is common and where users connect to the network at various data rates demands scalable compression techniques. Scalability allows each user to receive the best possible audio quality given the current network condition. In addition, because the separation principle for source and channel coding does not apply to lossy packet networks, an audio source coding technique that explicitly considers channel characteristics is desirable. These goals can be achieved by using a higher level description for audio than the actual waveform. This talk will focus on a method for extracting meaningful parameters from general digital audio signals that takes into account the way humans perceive sound; moreover, application of this parametric model to scalable audio compression will be discussed.

The model consists of three major components: sines, transients and noise. These underlying signal components are found during the analysis stage of the model. Quantizing and compressing the resulting model parameters allows for efficient storage and transmission of the original audio signal. The talk will cover enhancements made to current sine models. These enhancements allow explicit perceptual information to be included in the sinusoidal model. In addition, a novel transient modeling technique will be covered.

The three part model provides an efficient, flexible and perceptually accurate representation for audio signals. It therefore is appropriate for scalable compression over lossy packet networks. The efficiency of the model ensures high compression ratios. Flexibility simultaneously allows scalability and robustness to channel characteristics such as packet losses because subsets of model parameters represent the original signal with varying degrees of fidelity. Perceptual accuracy ensures that parameter subsets reasonably represent the original signal while the complete parameter set represents the original exactly in a perceptual sense.

Most current techniques for audio compression (e.g., MPEG audio layer 3 and AAC, Real Audio's G2, etc.), use a subband decomposition in conjunction with psychoacoustic models to compress the actual audio waveform itself. No model of the signal is assumed. These compression techniques have been very successful for targeted fixed bit rates; however, they cannot be scaled in large steps without severe loss in quality. This is evident in the case of Real Audio where a database will store many versions of an audio signal at various bitrates (e.g., 92Kbps, 64Kbps, 32Kbps, 20Kbps, and 16Kbps) and quality. Because using an underlying model allows meaningful subsets of parameters to describe the original signal, one compressed bitstream (e.g., 96Kbps) can be stored. Embedded within this bitstream are lower bitrate versions (e.g., 64Kbps, 32Kbps, 20Kbps, and 16Kbps) that can be easily extracted. Sound demos of the audio compression scheme will be played.

Hope to see you there!

-Tony



Next  |  Prev  |  Up  |  Top  |  JOS Index  |  JOS Pubs  |  JOS Home  |  Search

Download mus423h.pdf

``CCRMA DSP Seminar Prior Abstracts'', by Julius O. Smith III, Aut-Spr Quarters, CCRMA Ballroom, The Knoll, Stanford University.
Copyright © 2005-12-28 by Julius O. Smith III
Center for Computer Research in Music and Acoustics (CCRMA),   Stanford University
CCRMA  [Automatic-links disclaimer]