Sound Gaze

256-fall-2009: HW3

Roy Fejgin

Getting the project

This compressed tar file contains all the files needed to build and run this assignment.

Build instructions

The project builds for JACK/Linux.

To build just type 'make'.

To clean all temporary files and executables type 'make clean'.

To execute, run 'sndgaze'.

README

Visualization

This project is a audio visualizer. The audio input is visualized through:

A waterfall plot: shows a history of the input signal's spectrum (orange in the image above)
A display of a windowed version of the time domain signal (pink in the image above).

Pitch Tracking

I implemented pitch tracking using auto-correlation. The auto-correlation is computed in the frequency domain. The spectrum that was computed for the waterfall plot is reused, so the auto-correlation only costs O(N) operations (where N in the FFT size). The auto-correlation frequency bin with the largest magnitude is selected as the pitch.

The pitch is displayed using a sphere whose x-axis location depends on the pitch's value. Also, the size of the sphere corresponds to the magnitude of the signal at that pitch's frequency.

I also implemented zero-crossing based pitch tracking , but ended up not visualizing it since the auto-correlation based method proved superior - especially with complex signals.

Auditory Pitch Feedback

If the user presses 'p', auditory feedback is enabled. That means that when a pitch is detected the program plays a tone (sine wave) in the corresponding frequency. The user can then listen to the tone and compare the its pitch to their own perception of the pitch of the original signal.

Additional Features

Pressing 'l' toggles logarithmic frequency in the waterfall display.

Pressing +/- magnifies/shrinks the waterfall display.

Pressing Space freezes/unfreezes the visualization image.

Comments

When listening to the auditory feedback, the pitch sounds quite off in low frequencies. It seems to be because our hearing can discern pitch quite well in low frequencies, but we only have a few FFT bins for those frequencies. Would zero padding help ? Increasing the FFT size would probably would, at the cost of latency.
The pitch detection was thresholded such that pitch is not detected if the audio level is very low. That's done to avoid false pitch detections when there is no real signal, just low-level system noise.
I can gaze at this for hours...