Back

Project 2

Featured Artist – Singing Starry Night

Reflection

After having learned Fourier Transform in both image and sound classes, I started to think about their connections, which led me to build an application that matches the spectral features of image and sound. Though they can both be transformed to the frequency domain, one of them comes from the space domain whereas the other comes from the time domain. Accordingly, the frequency range of image is way higher than that of sound. To avoid every row and column from mapping to the same sound clip, I tried a few methods, including reducing the number of features, and adding a low-pass filter seemed to work the best. Using this tool, I first scanned through Van Gogh's Starry Night and realized that since the texture of the whole painting is relatively consistent, there would not be very big change in the sound each row and column mapped to. Therefore, I picked another Starry Night by Edvard Munch which is quite different from Van Gogh's. For sounds, I used two pieces that are both inspired by Van Gogh's Starry Night, one with human singing and the other with orchestra. Interestingly, most spaces of Van Gogh's Starry Night are mapped to human voice while Munch's Starry Night is dominated by pure musical instruments. This result somehow matches their styles in that Munch is slightly more abstract than Van Gogh. Another issue that took me some time to improve is the delay of output. Different from sound input where a delay filter in ChucK could help, visual information needs to be communicated between Unity and ChucK back and forth, so I have to fix the visual feedback momentarily while transferring the visual input in real-time. In the final version, the white rectangles do not follow the mouse in real-time but indicate the specific row and column that Chuck just finishes processing.

Phase 3

Phase 2

What would it sound like if The Starry Night (painting by Vincent van Gogh) sings Starry Starry Night (song by Don McLean)? Inspired by this idea, I took each row and column of pixels from the painting and did Fourier Transform as well. Since the value of image pixel is always non-negative, I first took the absolute value of the song, then split it into snippets, and finally extracted their feature vectors. The sound mapped to the row is played through the right channel whereas the sound mapped to the column is played through the left channel. The white and rectangles indicate the processed areas. To increase variability, I added one more painting and one more music piece.

Instructions

Source Code

Download

Acknowledgements

Phase 1

The default setting Centroid + Flux + RMS + MFCC (20 dims) was used as the baseline. I first tried adding RollOff and Kurtosis separately, but only Kurtosis increased the average accuracy slightly. Next, I added SFM (24 dims) to the baseline and the performance boost was larger than Kurtosis, so I decided to keep SFM. Finally, to reduce the dimensionality, I tested MFCC with fewer number of coefficients, and 10 seemed to be a good fit. Therefore, the final best setting would be Centroid + Flux + RMS + MFCC (10 dims) + SFM (24 dims).

Experiment Results

Milestone