The Feedback Delay Network (FDN) is an efficient structure for generating real-time artificial reverberation. We study the modal behaviour of FDNs, and mode coupling when the mixing among the delay lines is increased. The effect of the mixing matrix on the echo density profile is investigated, and an empirical method for determining the mixing matrix for a desired mixing time is proposed.
In an extended work, we proposed the Grouped Feedback Delay Network (GFDN) , that has different attenuation filters in different delay line groups. We used the GFDN for modeling reverberation in coupled rooms. The presentation accompanying the DAFx paper is available here. We have also explored the design of frequency-dependent, unlossless coupling matrices in the GFDN to model wave effects such as diffraction.
I presented my research on GFDNs at Jean Le Rond d'Alembert Institute at Sorbonne University in December 2021.
Similar to the FDN, the Scattering Delay Network (SDN) is a delay-network reverberator. Unlike the FDN, the SDN has parameters based on the room geometry and source-listener positions - thereby rendering early reflections accurately and higher order reflections with courser approximation. The standard SDN only renders the first-order reflections exactly. In the SCReAM project, we have extended SDNs to render higher-order reflections correctly by proposing various topologies and directional scattering matrices. The higher-order SDNs were rater higher in naturalness and texture than the standard SDN and the image-method.
At Facebook Reality Labs, my research on Room Impulse Response Interpolation for augmented reality applications was published in ICASSP 2021. We detect and interpolate low-frequency room modes from sparse microphone measurements to a continuous spatial mapping by solving the homogenous Helmholtz equation with non-linear optimization. With offline estimation of the model parameters, real-time interpolation of the room response can be performed very fast with parallel biquad filters as the subject moves around the room.
The wave-effects of diffraction and occlusion are key to reproducing audio realism in VR. Whilst many mathematical models of sound diffraction exist, our study is the first to compare them perceptually. Developed during a final year technical project, the paper associated with the study won the best paper award at AES Conference on Audio for Virtual and Augmented Reality, 2022.
The Image Method is widely popular for rendering the acoustics of shoebox rooms. However, it cannot model wave scattering and in highly symmetric rooms, it leads to the phenomenon of "sweeping echoes" due to the time alignment of multiple image sources. We address this problem by replacing the plane wave reflection coefficient used in IM with the spherical wave reflection function, which takes into account directional scattering. The Complex Image Method leads to a significant reduction in sweeping echoes in cuboid rooms.
In May 2023, I gave an overview of modeling room acoustics in AR/VR applications in a talk organised by the UK Acoustics Network (UKAN). The video is available to watch below.
My dissertation research was on microphone "bleed" (cross-talk) cancellation in ensemble recordings. While recording an ensemble of musicians, it is often desired to isolate the instruments to avoid interference from other sources. Close-miking and acoustic isolation booths are some techniques for mitigating microphone bleed. I proposed statistical signal processing methods for reducing bleed in the post-processing stage with Maximum Likelihood and Maximum Aposteriori Probability estimation. The proposed methods showed impressive results against the state-of-the-art Multichannel Wiener Filter on simulated and real recordings.
The public part of my dissertation defense (with all of its technical glitches) is available to watch online. I am a victim of pandemic-affected online vivas, but most of my friends, academic and otherwise could attend the Zoom viva.
Sounds emanating from resonant objects such as rooms, plates and string instruments are composed of modes (standing waves) vibrating at different frequencies, each with its unique decay rate. Modal synthesis aims to reconstruct sounds by estimating these mode parameters and efficiently synthesizing modes using parallel biquad filters.
We have measured and modeled carillon bells at Stanford's Hoover Tower using modal synthesis. Our 'computer carillon' can ring at different dynamic levels using a parameterized clapper-bell interaction function. Sound examples are available here.
Modal parameters can be estimated on a warped frequency axis to resolve beating partials. The proposed method, Frequency warped ESPRIT, is used to model coupled piano strings that exhibit two-stage decay and beating modes in doublets and triplets, and room impulse responses. An additional optimization step is used to fine-tune the mode estimates. Sound examples are available here.
We have proposed a more efficient MUSIC (MUltiple Signal Classification) algorithm - FAST MUSIC, which is numerically more stable and suited for detecting close-frequency beating partials in approximately periodic signals. Some possible applications of these techniques in music research include modeling instruments such as pianos and bells where close frequency beating is often observed.
The Extended Kalman Filter is used to track fundamental frequency, amplitude and instantaneous phase of monophonic audio signals. It has certain advantages, such as a unique pitch value for each sample of data, unlike most block-based methods like the CREPE or YIN estimator, and is robust to the presence of a large amount of observation noise. However, it has certain drawbacks such as poor transient performance and slow detection of rapid pitch changes. These drawbacks have been addressed in an extended journal paper. Performance on vocal singing excerpts can be found here.
The Ranchlands' Hum is a low frequency noise around 40Hz that has been plaguing the residents of Calgary, Canada for years. As an intern in the department of Electrical and Computer Engineering at University of Calgary, I assisted Dr. Mike Smith in developing an Android application that could capture, store and analyze low frequency noise. I added features that integrated the existing application with an SQLite database, calculated and plotted signal metrics. The project received some media attention.
The Kalman Filter is an MMSE estimator that can be used to remove background noise from speech. The filter equations are formulated based on the linear Autoregressive model of speech production. We implement a novel algorithm that tunes the Kalman Filter by accurately determining its parameters - measurement and process noise covariance. We also study the effect of changing AR model order on speech corrupted with various types of noise of various SNRs and summarize the results in an undergraduate thesis.
Tabla is is a membranophone percussion instrument (similar to bongos) which is often used in Hindustani classical music. The instrument consists of a pair of hand drums of contrasting sizes and timbres. The rhythmic pattern of any composition in Indian music is described by the term tala, which is composed of cycles of matra-s. Tala roughly correlates with the metres in Western music. Our aim is to determine the number of beats that constitute tala-s in different tabla solos. We develop a heuristic algorithm that extracts peaks from the tabla signal, corresponding to single or composite strokes and devise statistical methods to ensure that spurious noisy peaks are removed,and missed peaks are accounted for. We obtain excellent results for solo tabla recordings played by human artist.