Concept
Setup
Manual Commands
Design
Architecture
Aesthetic
Testing
Additional Links
The motivation behind this project is to identify metaphorical links between live image and sound. It was built in view of a broader project, which is to create a platform in which users can arbitrarily select feature mappings and see/hear the result in real time. Currently, features are automatically extracted from both the acoustic spectrum and the image characteristics of each file in the media databank.
Often, audio-visual compositions tend to be hand coded to great aesthetic effect, thereby achieving very coherent narratives; this does not allow for real time customization but is more suited for choreographed production. Tempora is to be used as a live instrument: video streams and sounds are to be manipulated in real time with the overall effect being partially predictable but not set a priori.
top ^
The latest application has been tested for MacOSX 10.11 (El Capitan).
First, make sure you have OpenCV and libsndfile installed, and keep note of their directory path. For example, I installed OpenCV using Homebrew:
$ brew tap homebrew/science
$ brew install opencv
$ cd path/tempora/src/
$ make
$ ./Tempora
Note that all the required source and media files are already included; however, you may play around with the video and audio content by replacing it with your own.Once the program is running, you may interact with it via the keyboard commands in the following tables. The details about each visual and audio effect can be found in the later sections.
To interact with the program, make sure the video window is selected. Note that when you type a number key, the terminal window will say so.
ESC or q | quit the program |
Visual control is simple. You can transition between video textures by pressing '=' and '-'. The second video, underlying the natural texture, is not explicitly controlled by the user. The 'a' , 's' , 'd' , and 'f' keys select a specific visual effect, which the user can reduce or intensify by pressing '[' and ']'.
= | transition to next video texture |
- | transition to previous video texture |
] | increase selected video effect |
[ | decrease selected video effect |
\ | cancel selected effect (set relevant parameter to 0) |
a | select erosion effect |
s | select dilation effect |
d | select inversion effect |
f | select brightness effect |
Number keys [1-5] will select the audio content for play (given the current setup, there are only five audio files). At this point, there are two modes: with grain and without grain, which can be toggled by pressing 'g'.
If grains are on for the selected audio file, pressing 'p' will play or stop playback of this file with grain synthesis.
If grains are not on, pressing 'p' will simply toggle playback of the audio file itself.
This allows the user to control each voice individually, playing them at the same time and choosing which to keep mute. The grain synthesized versions of each sound can also be overlaid, and muted or played back individually and independently of the no-grain playback.
When a given sound file is selected with grains on, its synthesis parameters are controlled by camera interaction, as described in the Design section, though these can also be determined via keyboard.
g | toggle grain synthesizers on/off |
k | select previous sound file (selection order will be shown in the console) |
l | select next sound file |
p | if grains are off, play the selected sound |
' | increase selected audio parameter |
; | decrease selected audio paremeter |
c | select grain duration (ms) |
v | select grain delay (ms) |
b | select grain randomness (range between [0,1] ) |
n | select number of grains for each sound file |
m | select output gain for grain synthesizer |
End users
The target demographic for this project includes anyone interested in studying the qualitative features of various audio/video interactions.
Interface
As a future project, the goal is to move away from the 2D screen and to leverage sensing technology instead. For design purposes, one can start with a laptop display and thus make explicit the correspondence between available parameters and user input.
Two windows for real time visualization are displayed on screen. One for the processed output (right, in figure) and a small window for camera input (left). Parametrization is automatic, but users can affect the nearly autonomous system either by producing motion-frame differences within four quadrants of the the camera or via the hard key commands listed above.
Interaction is limited to keyboard control, visual feedback, and real time camera input via simple motion detection. The main intended control is via camera input.
In the camera input frame, the top right quadrant controls fade in/fade out between a top layer natural texture image and background footage taken from open access historical archives. It is up to the user to draw from whichever files they wish.
Motion in the top left quadrant of the camera frame controls grain randomness, the bottom left controls gain, and the bottom right controls duration. The user can thus add dynamic effects to the selected sound files. Note that the user must explicitly select the sound file for playback, then press 'p' to start the sound. Pressing 'g' toggles granular synthesis.
top ^
Sound
The audio engine has access to a databank of pre-selected audio files. One set of sounds was chosen for textural quality (e.g. rain). Another set was chosen based on temporal qualities (percussive loops). The remaining samples were chosen for their sustained timbre (a cello, a singing voice, etc.). As the user selects files for playback, spectral features are extracted to drive the visualization parameters. The user therefore does not compose melodic sequences but instead emphasizes and plays with more subtle acoustical qualities in relation to visual output.
The Audio_Engine class manages the audio data playback. Audio_Processor extracts spectral rms, centroid, flux and potentially other features. Audio data is passed back to main() to inform the Visual_Processor. Below is the list of current parameters that drive visualization and of the effects that are themselves driven (either automatically or by the user).
Parameters
- spectral centroid, continuous
- RMS, continuous
- spectral flux, continuous
- peak amplitude, per audio buffer
- peak frequency, per audio buffer
Effects
- grain duration, continuous
- grain delay, continuous
- randomness (affects starting position and duration), continuous
- number of grains, discrete
Graphics
As each sound is read into buffers and processed, the graphics engine displays a continuous video stream. In order to process and display video frames, the engine makes use of the OpenCV library in C++.
A databank of videos, selected for their aesthetic qualities (see References), is accessed and mixed by the Visual_Engine. Most files are used as video textures, while others contain explicit objects or dynamic events (a falling rain drop). Temporal and timbre qualities of the selected sounds drive the video texture effects. The Video_Data data structure is passed back to main() in order to communicate changes in the video paramaters to the Audio_Engine.
Parameters
- spectral distribution (red, green, and blue)
- brightness (computed as a weighted sum of the three rgb values)
- optical flow (computed using the built in Lucas-Kanade algorithm with Shi-Tomasi feature tracking)
Effects
- erosion
- dilation
- inversion
- brightness
top ^
A great deal of attention must be dedicated to selecting the audio and video palette; the most difficult part was designing the parameter analysis and mapping. More experimentation will have to be dedicated towards finding the right constraints while allowing for as much flexibility as possible.
Visually, the aesthetic depends on the subjective quality of the resulting media output. The goal is to avoid repetition, and this can be achieved by adding randomness in the parameter mapping. Hopefully this will be helped by the fact that the audio/video files themselves change over time.
top ^
Potential measures of success are level of output repetition (must be as non-repetitive as possible), the user's enjoyment, and the project's creative potential.
top ^
OpenCV tutorials can be found at: opencv-srf.blogspot.com
Video files were obtained from the open access movie archive: archive.org
top ^