All of the following samples can be found in wav format on the web, at http://ccrma.stanford.edu/~bosse/. Note that no rate controlling module is developed, and thus bitrates vary a lot with the type of signal. From the values below, one can deduce that a fair perceptual lossless coding can be achieved at about 128 kbit/second for stereo data and 75 kbit/second for mono.
Audio Stream | Bits/s | M/S | Apparent artifacts |
Mono audio | |||
mixed.wav | 67 kb/s | M | |
jacob.wav | 71 kb/s | M | |
cardigans.wav | 73 kb/s | M | |
strings.wav | 58 kb/s | M | |
Stereo audio | |||
music.wav | 106 kb/s | S | Especially triangle and castanets |
tpd.wav | 118 kb/s | S | |
jacob.wav | 124 kb/s | S | |
castanets.wav | 103 kb/s | S | Very audible preecho |
instruments.wav | 108 kb/s | S | |
oasis.wav | 118 kb/s | S | The ``s'' in ``sunday' |
Low bitrate | |||
oasis.wav | 89 kb/s | S | Easy to detect |
jacob.wav | 83 kb/s | S | Easy to detect |
M/S means mono/stereo. The low bitrate signals are coded with a masking threshold multiplied by a factor and respectively.
The encoder described in this report is apparently rather undeveloped. To improve the coder, I would like to add some kind of transient coding, for example using wavelets. Transform-wavelet hybrid coders (see e.g [11]) has become more popular and show good results. Also, an adaptive prediction in either the time- or transform domain would decrease bitrate in stationary signals (although some of this is exploited by the KLT).
This project did not result in a coder with many new features, but in some experience and knowledege for me in the field of high fidelity perceptual audio coders.