Words with similar meanings can carry different affects. Can these affects be captured with sound? I had two objectives:
1. Map words to synthesizer settings
2. Use the mappings to create an audiovisual experience
Newly-entered words are assigned random timbres. Play with the synthesizer and rate the correspondence between the word and the timbre. As correspondence increases, randomness decreases the next time the word is assigned a timbre. Eventually, the word's timbre converges.
Based on their synthesizer settings, words are also displayed within a three-dimensional space. We navigate by typing a word and spinning until it's in view. The focal word vibrates along to the synthesized audio.
The last five notes that were played are stored in memory. When the synthesizer hasn't been touched for a brief period, they are randomly assigned to nearby words, which begin to hop and trigger the notes at random. These notes are re-loaded whenever the word is entered again. This allows users to scatter traces of themselves for other users to explore. It also colors each word more effectively than timbre alone since frequency range and chord selection provide more information.
Resetting the word-to-3D-space mapping scheme causes words to careen to a single point on the screen before exploding out to updated positions.
Quick User Guide
"Enter Word": type a word you'd like to hear and press either "enter" or "space"
"Rate Correspondence": slide the knob to rate the match between the focal word and its current timbre
"What word(s) does this sound like?": enter a list of words, separated by commas, that are evocative of the current timbre
"s": switch between full-screen and window-view
"r": reset word-to-space mapping
"[","]": rotate up/down about the focal word
"p","\": rotate left/right about the focal word
"k","l": decrease/increase playback tempo "n","m": decrease/increase how long notes are typically held "q": quit and save the new synthesizer and 3D space map settings
STK was used to generate sounds through FM synthesis and ADSR effect envelopes. The same attack, decay, and release times are used for all effects.
Each time a correspondence value is entered, the total number of votes is increased by 1 and the mean correspondence is updated. Correspondence is limited to the range within 0 and 1. As a word's correspondence increases, its color becomes brighter.
Synthesis Setting Updates
Each word has two sets of synthesis parameter values: a "center" set, and an "estimate" set. Every time a word is entered, estimated synthesis parameters are randomized and used to program the synthesizer. If estimated set correspondence is higher than the mean correspondence, then the estimated set becomes the center set.
Synthesis Setting Randomization
Initially, parameters are selected uniformly from the range of all possible values. As correspondence increases, the range decreases. If correspondence is 0, the range is the range of all possible values. If correspondence is 1, the estimate parameters are equal to the center parameters. Estimation equations are below.
multiplier = 1 - sqrt(mean correspondence)
maximum' = center + multiplier * (maximum - center)
minimum' = center - multiplier * (center - minimum)
estimate ~ U[minimum', maximum']
If a list of words is entered using "What word(s) does this sound like?" box, their synthesizer parameters are copied from the estimate set. Their position is the same as the position of the word is question, and their colors are randomized. They're initialized with correspondence, 0.6.
Synthesis settings are aggregated into feature vectors. Principal component analysis is applied (through the library, Alglib) to find the directions in which feature vector points have the largest standard deviation. The projection of a word's feature vector onto the three largest principal components represents the word's position in 3D space.
To place more weight on settings that have greater effects on timbre, some settings, such as modulator ratio and gain, were scaled up before being placed in the feature vector - this increases the standard deviation along their axes.
Random Sound Generation
The last 5 pitches played are assigned randomly to nearby words. If no interaction occurs for a few seconds, nearby words begin to trigger the synthesizer. To maintain [limited] order, note length is limited to eighth, quarter, half, and whole notes.
When word-to-3D-space remap is triggered, the words skew towards (0,0,0). While this occurs, two distance measurements are made. One is the maximum squared distance of a word's position from the origin. The other is the mean squared distance of each word's current position from its assigned 3D mapping position.
When the maximum squared distance dips below 10, the words explode outwards to their new positions. Sound is also generated. When words skew towards (0,0,0), the random sound generation scheme is applied, but tempo and frequency vary with the mean squared distance. As words move towards newly-assigned positions, only eighth notes of a single frequency are played, but both tempo and pitch are still modulated by mean squared distance.
Data Storage & Access
When a user quits the program, each word and its position, color, synthesizer settings, past 5 note frequencies, tempo, confidence, and number of votes are saved into a line in the file, "words.txt". The PCA matrix values are saved in a different file, "pcaParams.txt". Upon initialization, they are all reloaded.
While running, pointers to each word class are saved in hash tables, and accessed using the word's string.
The synthesizer is sadly monophonic because STK processing is CPU-intensive. A solution should be found.
A voting scheme that places more weight on the most recent votes would be less frustrating to use.
A transition probability matrix for determining note duration could result in more interesting/ordered patterns. Another alternative would be selecting note duration based on how long notes are held by the user.
Source code: scattered.zip