Concept

The motivation behind this project is to identify metaphorical links between live image and sound. It was built in view of a broader project, which is to create a platform in which users can arbitrarily select feature mappings and see/hear the result in real time. Currently, features are automatically extracted from both the acoustic spectrum and the image characteristics of each file in the media databank.

Often, audio-visual compositions tend to be hand coded to great aesthetic effect, thereby achieving very coherent narratives; this does not allow for real time customization but is more suited for choreographed production. Tempora is to be used as a live instrument: video streams and sounds are to be manipulated in real time with the overall effect being partially predictable but not set a priori.

top ^

Setup

The latest application has been tested for MacOSX 10.11 (El Capitan).

First, make sure you have OpenCV and libsndfile installed, and keep note of their directory path. For example, I installed OpenCV using Homebrew:

$ brew tap homebrew/science
$ brew install opencv

Note that this may take a while. Once installed, opencv will be located in: /usr/local/Cellar/opencv/_version_number_/ . This will be the path to include in the project Makefile, in case of compiler complaints.

To obtain the zipped package, click on the following link:

Download Tempora (~252 MB)

Once you unzip the file, you will have the complete "tempora" directory. Note that, depending on your method of installation and operating system, you might have to update the Makefile by setting the correct link path to your opencv2 directory.

To build, run Terminal and type the following into the command line (path is the file path to the "tempora" directory):

$ cd path/tempora/src/
$ make

To run the program, type:

$ ./Tempora

Note that all the required source and media files are already included; however, you may play around with the video and audio content by replacing it with your own.

top ^

Manual Commands

Once the program is running, you may interact with it via the keyboard commands in the following tables. The details about each visual and audio effect can be found in the later sections.

To interact with the program, make sure the video window is selected. Note that when you type a number key, the terminal window will say so.

PROGRAM CONTROL

ESC or q

quit the program

VISUAL CONTROL

Visual control is simple. You can transition between video textures by pressing '=' and '-'. The second video, underlying the natural texture, is not explicitly controlled by the user. The 'a' , 's' , 'd' , and 'f' keys select a specific visual effect, which the user can reduce or intensify by pressing '[' and ']'.

=	transition to next video texture
-	transition to previous video texture
]	increase selected video effect
[	decrease selected video effect
\	cancel selected effect (set relevant parameter to 0)
a	select erosion effect
s	select dilation effect
d	select inversion effect
f	select brightness effect

AUDIO CONTROL

Number keys [1-5] will select the audio content for play (given the current setup, there are only five audio files). At this point, there are two modes: with grain and without grain, which can be toggled by pressing 'g'.

If grains are on for the selected audio file, pressing 'p' will play or stop playback of this file with grain synthesis.

If grains are not on, pressing 'p' will simply toggle playback of the audio file itself.

This allows the user to control each voice individually, playing them at the same time and choosing which to keep mute. The grain synthesized versions of each sound can also be overlaid, and muted or played back individually and independently of the no-grain playback.

When a given sound file is selected with grains on, its synthesis parameters are controlled by camera interaction, as described in the Design section, though these can also be determined via keyboard.

g	toggle grain synthesizers on/off
k	select previous sound file (selection order will be shown in the console)
l	select next sound file
p	if grains are off, play the selected sound
'	increase selected audio parameter
;	decrease selected audio paremeter
c	select grain duration (ms)
v	select grain delay (ms)
b	select grain randomness (range between [0,1] )
n	select number of grains for each sound file
m	select output gain for grain synthesizer

top ^

Design

End users
The target demographic for this project includes anyone interested in studying the qualitative features of various audio/video interactions.

Interface
As a future project, the goal is to move away from the 2D screen and to leverage sensing technology instead. For design purposes, one can start with a laptop display and thus make explicit the correspondence between available parameters and user input.

Two windows for real time visualization are displayed on screen. One for the processed output (right, in figure) and a small window for camera input (left). Parametrization is automatic, but users can affect the nearly autonomous system either by producing motion-frame differences within four quadrants of the the camera or via the hard key commands listed above.

Interaction is limited to keyboard control, visual feedback, and real time camera input via simple motion detection. The main intended control is via camera input.

In the camera input frame, the top right quadrant controls fade in/fade out between a top layer natural texture image and background footage taken from open access historical archives. It is up to the user to draw from whichever files they wish.

Motion in the top left quadrant of the camera frame controls grain randomness, the bottom left controls gain, and the bottom right controls duration. The user can thus add dynamic effects to the selected sound files. Note that the user must explicitly select the sound file for playback, then press 'p' to start the sound. Pressing 'g' toggles granular synthesis.

top ^

Architecture

Sound
The audio engine has access to a databank of pre-selected audio files. One set of sounds was chosen for textural quality (e.g. rain). Another set was chosen based on temporal qualities (percussive loops). The remaining samples were chosen for their sustained timbre (a cello, a singing voice, etc.). As the user selects files for playback, spectral features are extracted to drive the visualization parameters. The user therefore does not compose melodic sequences but instead emphasizes and plays with more subtle acoustical qualities in relation to visual output.

The Audio_Engine class manages the audio data playback. Audio_Processor extracts spectral rms, centroid, flux and potentially other features. Audio data is passed back to main() to inform the Visual_Processor. Below is the list of current parameters that drive visualization and of the effects that are themselves driven (either automatically or by the user).

Parameters
- spectral centroid, continuous
- RMS, continuous
- spectral flux, continuous
- peak amplitude, per audio buffer
- peak frequency, per audio buffer

Effects
- grain duration, continuous
- grain delay, continuous
- randomness (affects starting position and duration), continuous
- number of grains, discrete

Graphics
As each sound is read into buffers and processed, the graphics engine displays a continuous video stream. In order to process and display video frames, the engine makes use of the OpenCV library in C++.

A databank of videos, selected for their aesthetic qualities (see References), is accessed and mixed by the Visual_Engine. Most files are used as video textures, while others contain explicit objects or dynamic events (a falling rain drop). Temporal and timbre qualities of the selected sounds drive the video texture effects. The Video_Data data structure is passed back to main() in order to communicate changes in the video paramaters to the Audio_Engine.

Parameters
- spectral distribution (red, green, and blue)
- brightness (computed as a weighted sum of the three rgb values)
- optical flow (computed using the built in Lucas-Kanade algorithm with Shi-Tomasi feature tracking)

Effects
- erosion
- dilation
- inversion
- brightness

top ^

Aesthetic

A great deal of attention must be dedicated to selecting the audio and video palette; the most difficult part was designing the parameter analysis and mapping. More experimentation will have to be dedicated towards finding the right constraints while allowing for as much flexibility as possible.

Visually, the aesthetic depends on the subjective quality of the resulting media output. The goal is to avoid repetition, and this can be achieved by adding randomness in the parameter mapping. Hopefully this will be helped by the fact that the audio/video files themselves change over time.

top ^

Testing

Potential measures of success are level of output repetition (must be as non-repetitive as possible), the user's enjoyment, and the project's creative potential.

top ^

Additional Links

OpenCV tutorials can be found at: opencv-srf.blogspot.com

Video files were obtained from the open access movie archive: archive.org

top ^