Perceptual Audio Coding

Music 422, Winter 2020

Instructor: Marina Bosi

Course Description

Did you ever wonder how your MP3 files squeeze so much sound into such a small size? What is the difference between MP3 and AAC? Or which multichannel audio coding format is best for your application?

The need for significant reduction in data rate for wide-band digital audio signal transmission and storage has led to the development of psychoacoustics-based data compression techniques. In this approach, the limitations of human hearing are exploited to remove inaudible components of audio signals. The degree of bit rate reduction achievable without sacrificing perceived quality using these methods greatly exceeds that possible using lossless techniques alone. Perceptual audio coders are currently used in many applications including Digital Radio and Television, Digital Sound on Film, Multimedia/Internet Audio, Mobile Devices, etc.

This class integrates digital signal processing, psychoacoustics, rate/distortion optimization, and programming to provide the basis for understanding and building perceptual audio coding systems. We review the basic principles underlying all the core components of a perceptual audio codec and study the design choices applied in state-of-the-art audio coding schemes, e.g. AC-3 (aka Dolby Digital), Enhanced AC-3, AC-4; MPEG Layers I, II, and III (MP3); MPEG AAC; MPEG-H. In-class demonstrations will allow students to hear the quality of state-of-the-art implementations at varying data rates and, as a final project, you will be programming your own simple perceptual audio coder.

Schedule

Below is a tentative schedule, subject to update. Required readings from the course textbook are referenced for each week. As a general rule, readings should be always done according to topics and prior to class. Unless otherwise specified the class will meet at CCRMA (map) on Friday afternoons from 2:30 pm until 4:20 pm, except for the lectures highlighted in red below.

Date Topic Reading Due
1/10 Course Overview and Audio Signal Representation Chapters 1, 3  
1/17 Quantization Chapter 2 HW1
1/21 Time to Frequency Mapping I (9am-10:30am)
NOTE: This class is on a Tuesday from 9 am until 10:30 am.
Chapter 4  
1/24 Time to Frequency Mapping II Chapter 5 HW2
1/31 Introduction to Psychoacoustics Chapters 6, 7 HW3
2/7 Bit Allocation and Basic Building Blocks of an Audio Codec Chapters 8, 9 HW4
2/12 Audio Codecs Evaluation
NOTE: This class is on a Wednesday and will be at Dolby Laboratories in San Francisco.
Chapters 10, 14
2/14 HW5 due by Friday February 14 at 2:30 pm HW5
2/21 Overview of MPEG and MPEG-1 Audio Coding Chapter 11 HW6
2/23 HW7 (Project proposal) due by Sunday February 23 at 11:59 pm HW7
2/28 Overview of MPEG-2, AAC and MP3 Chapters 12, 13
3/6 Overview of MPEG-4, MPEG USAC, MPEG-H and Other Coding Standards (AC-3, E AC-3, AC-4 etc.) Chapter 15
3/8 Project due by Sunday March 8 at 11:59 pm FP
3/13 Project Presentations (2:30pm-4:20pm)

Teaching Assistant

Scott Oshiro

Logistics

Grading

To earn 3 units in this course you will have to come to class, participate, do the homework exercises and complete a final project. This is the final grade breakdown:

Textbook

There is one required textbook for the class:

M. Bosi & R.E. Goldberg, "Introduction to Digital Audio Coding and Standards", Springer, 2003, ISBN: 978-1-4020-7357-1.

Office Hours

Marina Bosi is available by appointment.

Scott Oshiro's office hours:

Homework

Canvas

The main course website is accessed through Canvas, Stanford University's learning management system. All homework assignments, grades, and supplementary materials are accessible via the Canvas site, and all homework submissions must take place via your Canvas assignment submission portal.

Music 422 Canvas Site »

Homework Submission

The homework must be submitted using Canvas’s assignment submission system (in the Assignments tab). For each assignment, select the associated link in that tab and proceed with your submission (for additional information regarding file submissions for assignments on Canvas, read the page here)

For each homework, you must turn in 2 items:

For your files, use the following naming convention: hwX_suid.type where X is the homework number and suid is your Stanford ID. For example, the write-up for homework 1 will be hw1_mab.pdf and the code files hw1_mab.zip/hw1_mab.tar.gz/hw1_mab.tar.

Late Homework Policy

Homework Presentation

The presentation of your homework (write-up and code) will be evaluated for each of your submission. Well-presented homeworks will be awarded up to 2.5% extra credit. Poorly-presented homeworks will be penalized by up to 2.5%.

Below are elements of a well-presented homework:

Typed and scanned homework are equally subject to the guidelines above. Furthermore:

Project

This course includes a final project. The final project consists of the design and implementation of a simple perceptual audio coder. Groups of up to three students typically work together on the final project.

Requirements for the final project include a written proposal by the seventh week of the quarter (2/23), a written report by the ninth week of the quarter (3/8), and a presentation of the report by the end of the quarter (3/13). The aim of the report should be to fully document project design, methodology, and results.

Students may use the computer of their choice for the project, but Python 3.7 (http://www.python.org/) is the preferred programming language for implementation of the project coder. (Previous Python programming experience is neither required nor expected for this course.)

Resources

Contact

Please do not hesitate to contact Marina Bosi or Scott Oshiro with any questions or concerns.