The Feedback Delay Network (FDN) is an efficient structure for generating room impulse responses. We study the modal behaviour of FDNs, and mode coupling when the mixing among the delay lines is increased. The effect of the mixing matrix on the echo density profile is investigated, and an empirical method for determining the mixing matrix for a desired mixing time is proposed.
In an extended work, we propose the Grouped Feedback Delay Network (GFDN) , that has different attenuation filters in different delay line groups. We use the GFDN for modeling coupled rooms and single rooms constructed of different materials, and discuss methods for efficient room resizing. Sound examples are available here. The presentation accompanying the DAFx paper is available here.
At Facebook Reality Labs, my research on Room Impulse Response Interpolation for augmented reality applications has been published in ICASSP 2021. We detect and interpolate low-frequency room modes from sparse microphone measurements to a continuous spatial mapping by solving the homogenous Helmholtz equation with non-linear optimization. With offline estimation of the model parameters, real-time interpolation of the room response can be performed very fast with parallel biquad filters as the subject moves around the room.
My dissertation research was on microphone "bleed" (cross-talk) cancellation in ensemble recordings. While recording an ensemble of musicians, it is often desired to isolate the instruments to avoid interference from other sources. Close-miking and acoustic isolation booths are some techniques for mitigating microphone bleed. I proposed statistical signal processing methods for reducing bleed in the post-processing stage with Maximum Likelihood and Maximum Aposteriori Probability estimation. The proposed methods showed impressive results against the state-of-the-art Multichannel Wiener Filter on simulated and real recordings.
Sounds emanating from resonant objects such as rooms, plates and string instruments are composed of modes (standing waves) vibrating at different frequencies, each with its unique decay rate. Modal synthesis aims to reconstruct sounds by estimating these mode parameters and efficiently synthesizing modes using parallel biquad filters.
We have measured and modeled carillon bells at Stanford's Hoover Tower using modal synthesis. Our 'computer carillon' can ring at different dynamic levels using a parameterized clapper-bell interaction function. Sound examples are available here.
We have proposed a more efficient MUSIC (MUltiple Signal Classification) algorithm - FAST MUSIC, which is numerically more stable and suited for detecting close-frequency beating partials in approximately periodic signals. Some possible applications of these techniques in music research include modeling instruments such as pianos and bells where close frequency beating is often observed.
The Extended Kalman Filter is used to track fundamental frequency, amplitude and instantaneous phase of monophonic audio signals. It has certain advantages, such as a unique pitch value for each sample of data, unlike most block-based methods like the CREPE or YIN estimator, and is robust to the presence of a large amount of observation noise. However, it has certain drawbacks such as poor transient performance and slow detection of rapid pitch changes. These drawbacks have been addressed in an extended journal paper. Performance on vocal singing excerpts can be found here.
The Ranchlands' Hum is a low frequency noise around 40Hz that has been plaguing the residents of Calgary, Canada for years. As an intern in the department of Electrical and Computer Engineering at University of Calgary, I assisted Dr. Mike Smith in developing an Android application that could capture, store and analyze low frequency noise. I added features that integrated the existing application with an SQLite database, calculated and plotted signal metrics. The project received some media attention.
The Kalman Filter is an MMSE estimator that can be used to remove background noise from speech. The filter equations are formulated based on the linear Autoregressive model of speech production. We implement a novel algorithm that tunes the Kalman Filter by accurately determining its parameters - measurement and process noise covariance. We also study the effect of changing AR model order on speech corrupted with various types of noise of various SNRs and summarize the results in an undergraduate thesis.
Tabla is is a membranophone percussion instrument (similar to bongos) which is often used in Hindustani classical music. The instrument consists of a pair of hand drums of contrasting sizes and timbres. The rhythmic pattern of any composition in Indian music is described by the term tala, which is composed of cycles of matra-s. Tala roughly correlates with the metres in Western music. Our aim is to determine the number of beats that constitute tala-s in different tabla solos. We develop a heuristic algorithm that extracts peaks from the tabla signal, corresponding to single or composite strokes and devise statistical methods to ensure that spurious noisy peaks are removed,and missed peaks are accounted for. We obtain excellent results for solo tabla recordings played by human artist.