Research

 

I work on a variety a of projects that involve signal processing and machine learning applied to audio. These are some of the projects that I have worked on. Click on the name of an individual project for descriptions and demos ( quicktime player is required for some of the sound files).


VoCo: Text-based Insertion and Replacement in Audio Narration - We developed an algorithm to automatically synthesize speech from a given speaker by simply typing the word in text.


DAPS Dataset - I created a dataset to help in developing algorithms that attempt to transform speech recorded on common consumer devices in real-world environments into professional production quality speech.


The Visual Microphone - When sound hits an object, it causes small vibrations of the object’s surface. We developed a technique, using only a video of the object, to extract those minute vibrations and partially recover the sound that produced them.


Interactive Source Separation - We have developed algorithms and an interface to allow user’s to interactively perform source separation by painting on spectrograms.


Content Based Tools for Editing Audio Stories - We have developed a set of content based tools to allow creators of audio stories to analyze and manipulate speech and music at a high level.


Language Informed Speech Separation - We have developed a method to constrain non-negative factorial hidden Markov models with a language model. This has greatly increased speech separation performance.


Audio Imputation Using the Non-negative Hidden Markov Model - We show how to use the N-HMM for audio imputation in order to restore corrupted signals.


Noise Robust Automatic Dialogue Replacement - We have developed an algorithm that can automatically replace a recording of dialogue in a noisy film set with a clean recording of the same dialogue in a studio by warping the clean recording to match the noisy one.


Automatic Synchronization and Clustering of Videos from Multiple Cameras - We have developed an algorithm to automatically synchronize multiple videos of an event taken from numerous cameras. It we have videos of multiple events, it can also cluster them into discrete events.


Non-negative Joint Modeling of Spectral Structure and Temporal Dynamics - We have developed new models of single sound sources and sound mixtures that extend spectral dictionary learning methods to account for non-stationarity and temporal dynamics.


Source Separation by Humming - We have developed an algorithm in which one just needs to vocalize an instrument that he / she wishes to extract. This could be things such as singing the vocals, whistling the guitar part, or beatboxing the drum part that is to be extracted.


Source Separation by Score Synthesis - We have developed an algorithm that performs source separation using only a musical score (MIDI file) as an input.


Lead Instrument Extraction - This is simply defined as the extraction of a lead instrument from a recording of multiple instruments. It can be used for automatic karaoke, automatic “guitar hero” tracks, automatic jam tracks, and various other applications.


Multipitch Estimation - This is the concurrent estimation of the pitch of multiple instruments in a sound mixture. It is an integral part of automatic music transcription.



Home Page