Pamornpol Jinachitra's WASPAA'05 Demo

Joint Estimation of Glottal Source and Vocal Tract for Vocal Synthesis Using Kalman Smoother and EM Algorithm

Belows are original /a/ sound, its noisy versions and re-synthesized sounds generated from parameter estimates obtained from EM-Kalman smoother, using Rosenburg-Klatt derivative glottal model and an all-pole vocal tract filter.

CLEAN INPUT

Original (clean) : SNR is only about 50 dB actually, as estimated from averaged signal variance and noise floor variance at the beginning.
With pre-emphasis : Using Hann-7 window to smooth all parameters before synthesis.
No pre-emphasis
Resynthesized (breathy) : Now with an increase in OQ by 0.1 and noise model as used by Vicky Lu

NOISY INPUT

With 20dB white noise
With pre-emphasis : Use Hann-7 smoothing window. Bad vocal tract filter estimates due to emphasized noise.
Without pre-emphasis : Use Hann-9 smoothing window. Stable vocal tract filter estimates. No musical noise, but sounds muffled.
Enhanced sound : as generated by concatenation of Kalman smoothed state estimates (with no pre-emphasis). Musical noise exists.
With 20dB pink noise : Using 3rd order ARMA filter to generate pink noise
With pre-emphasis : Use Hann-9 smoothing window. Better sound than having additive white noise input.
Enhanced sound : as generated by concatenation of Kalman smoothed state estimates (with pre-emphasis).