CV



Ph.D. Thesis



Publications

Copyrighted materials. All rights reserved.


At Stanford

Research

Dissertation advisor : Prof. Julius O. Smith III
Affiliation : Center for Computer Research in Music and Acoustic (CCRMA)

Music Applications

My interest is in general audio analysis, applied to speech and musical signal. Particular applications include sound source separation, transcription, extraction and removal, pitch detection, instrument identification, indexing and retrieval, speech enhancement and dereverberation, robust speech recognition, speech production model.

My past research activities have been about how to separate musical sources from a song, especially a singing voice. The application includes melody transcription for indexing and retrieval and to extract or remove the singing voice track from the song for musical purposes. The tools I have been studying range from sinusoid model to statistical learning, from top-down to bottom-up processing. Also, polyphonic transcription is the ultimate problem I'm interested in. It requires segmentation, instrument identification, pitch detection along with many other relevant information which, in the future, in lights of MPEG-7 etc., will be very desirable. Some examples of work in this area are shown below.

Instrument Identification from Polyphonic Signals
About Audio Source Separation
Constrained EM estimate for harmonic source separation
MUS421 project: The sound of a plectrum, finger or fingernail plucked string

Multidisciplinary

As a sidetrack, I also do useful audio signal processing for interactive applications in toys. The features that were implemented and tested are silence detection, voice modification, source localization and separation, and query-by-humming(or singing). The pause detection was implemented in C/C++ in the eventual prototype for real-time. Query-by-humming is now also in C/C++ taking a few seconds to verify one song out of possible ten.

Media X : Interactive Toy project

Speech Applications

From recently, I have been involving research on audio analysis with applications to speech. In particular, speech enahancement from noise and reverberation for listening and robust speech recognition purposes. My research focus is on speech production model and automatic identification of its parameters under possibly noisy circumstances. The model-based approach allows for flexibility in reconstructing the speech source with modifiable pitch and duration among other characteristics. It also fits into the theme of structured audio where a sound object is described by a compact set of parameters.

Review of speech synthesis(last update : Feb 2006)
Review of sound source separation(last update : June 2003)
Demo of parametric voice coding(submitted to ICSLP'06)
Demo of joint source-tract parameter identification in noise(WASPAA'05)

Classes

EE261 : Fourier Transform and Its Applications
EE278 : Intro to Statistical Signal Processing
EE263 : Linear Dynamical System
EE398a : Image Communication I
EE369a : Medical Imaging I
EE367a (Music420) : Applications of the Fast Fourier Transform
EE368 : Digital Image Processing
CS229 : Pattern Classification
EE262 : Two-Dimensional Imaging
EE292B : Electronic Documents - Paper to Digital
Math 266 : Wavelets
EE367B(Music421) : Signal Processing Methods in Musical Acoustics




At Imperial College

Blind Source Separation of Convolutive Mixtures (M.Eng. Dissertation 2001)
Partial summary : BSS of convolutive mixtures
UROP report prior : BSS of delayed mixtures
Adaptive Signal Processing, Optimization, Digital Filters, Neural Network, Discrete-time Control, Digital Image Processing, Power Spectral Estimation, Operation Research, Philosophy I&II




At NECTEC

(National Electronics and Computer Technology Center, Thailand, summer 2002)


Blind Separation of a Single Channel Audio Mixture


Last updated 2/5/04
mail to : pj97 at stanford.edu