at its heart, dtree is an interface for image-based and audio-based search. the database of audio/image snapshots is built up by users. if you find an interesting sound or view, you can record the scene by holding down the translucent button on screen. audio is recorded while the button is held down, and an image is captured when the button is released. the audio/image snapshot is then uploaded and analyzed. to find similar snapshots, you can either search on the audio or the image you just recorded by swiping down or up. when the results of a query are returned, you can navigate through similar snapshots by swiping to the left and right.

software design

the client devices record audio/image snapshots that are then uploaded to a django server, along with location, accelerometer, and heading metadata. audio and image features are extracted on the server. the image features used are blocks of color summaries. the image is decomposed 12 blocks, and the average RGB value over that block is recorded. the audio features extracted are rms and mfccs. to extract mfccs, i ported some of malcolm slaney's audiotry toolbox code to python from matlab. the features are extracted over 4096-sample, windowed buffers. the features are then averaged over the length of the track. search is conducted by first finding the subset of audio/images from the database that fall within one half of a standard deviation of the query item along all feature dimensions. these items are then sorted by euclidean distance fromt the query. the server returns a list of the closest audio/image snapshots to the client, whcih then downloads all snapshots that aren't currently stored on the local device.

future directions

while the app was an interesting experiment in audio/image search over a crowd-sourced database, i think the interaction was serously lacking. i think the app could be more interactive by providing some sort of method for passive use. specifically, i think it would be interesting to create animations based on queries. so instead of a user manually swiping through the similar items, the returned set could be animated. this could be extended by automatically searching for a similar image on a random item of the returned set of an audio query, or a by searching for similar audio on the returned set of a similar image query. these new sets could be prefetched and played for the user in an unending animation of a random walk through similar audio/image snapshots.

another route i considered was creating a game out of trying to navigate to a random seed item in the database. a user would be shown an audio/image snapshot that they would have to navigate to. the catch is that they'd have to start the search with their own audio/image snapshot. the goal would be to get to the seed snapshot as quickly as possible. so if the seed snapshot is of a hissing ferret, the user may choose to try to imitate the sound or photograph any ferret-like objects that may be nearby. the user can then search on the audio or image of their uploaded snapshot and will try to navigate through the returned set (and the returned sets of subsequent searches) to find the seed snapshot. a similar idea would be a simon says type of game, where the user has one chance to try to mimic one (or both) dimensions of a seed snapshot.


download it and run it!

dtree client

dtree server