Audio
Diary
Final Project for Music 256b
Roy Fejgin
CCRMA,
Stanford University
This
project was created to enable to people to easily document the audio in their lives,
and to explore the sounds others choose to share with them. It is envisioned as
an application that runs continuously, perhaps all day, requiring very little
interaction by user. The application keeps a snapshot of the recent of audio
from the microphone. So whenever the user realizes something interesting has
just happened, they still have access to it and can save it. An auditory tapping
based interface allows starting and stopping the recording without even
unlocking the phone.
The
recorded audio clips can be saved to a server "in the cloud", along with information
about them such as location, user ID, time of day, etc. Later on, the user can
use the app to explore those clips, and clips created by others.
The
application uses a tab bar to represent its four modes of usages: recording,
uploading, searching, and browsing. The following sections describe those modes
in detail.
The recording
interface allows easy capturing of audio events, even after they have occurred.
To achieve that, the phone constantly monitors the microphone input and stores in
memory a "window" containing recent audio (currently set to one minute). Audio
that is older than the window size is discarded.
When
the user notices an interesting audio event, they can tap on the microphone a
number of times, which causes the app to switch to 'recording' mode. The
previous two minutes of audio are saved to disk, as is all subsequent audio. Later,
the user can stop the recording using another tapping sequence. Every time a
tapping gesture is detected, a short audio cue is played to provide the user
with feedback.
The
recording process results in the creation of a collection of audio clips saved on
the user's phone. The upload tab consists of a list of audio clips, sorted by
their recording date and time. Touching a clip causes it to be uploaded to the
server, from which it can later be retrieved or shared with others.
The
search tab is for exploring audio clips residing on the server. It includes two
search methods: by recording time (most recent first), and by distance from the
current location (not yet implemented). Pressing one of the search buttons
sends a query to the server. When the response arrives, the view automatically
changes to the browse tab.
The
browse tab shows a list of clips available on the server. Touching a clip causes
it to be downloaded and played.
View Controllers:
Four
view controllers were implemented, one for each tab. The controllers
communicate using NSNotifications. The
Record and Upload controllers also share a ClipManager object which maintains
a data structure representing the recorded files and their metadata.
Audio recording:
To
implement the continuous monitoring and recording functionality, the
AudioManager maintains a double-buffer of circular buffers. When monitoring,
only one buffer is used.
The circular buffer data structure is well suited to the monitoring application
since the overwriting of old audio occurs naturally.
Once
recording to disk is initiated by the user, initiates the double-buffereing
scheme: when a buffer fills up (or is nearly full), it switches to the other
buffer. It then spawns a new thread which writes the full buffer to disk
without delaying the audio thread.
Tap Analysis:
Tap
detection makes use of a STK one-pole filter to do amplitude envelope tracking.
When the filter's output exceeds a threshold, a tap is detected. Further tap detections
are suppressed for 40 milliseconds in order to avoid duplicate detections.
Gesture
detection is currently very simple. A gesture is any sequence of four taps. The
gesture is interpreted as 'start' or 'stop' based on the state of the system.
The
server side makes use of a few PHP scripts backed by a MySQL database. The
database tables and most of the PHP are closely based on examples shown to us
in the tutorial.
There
are a few improvements I would like to implement in future versions:
This
project uses the MOMU API, the Synthesis Toolkit (STK), and ASIHttpRequest.
The project is available here.