Joint Estimation of Glottal Source and Vocal Tract for Vocal Synthesis
Using Kalman Smoother and EM Algorithm
Belows are original /a/ sound, its noisy versions and re-synthesized
sounds generated from parameter estimates obtained from EM-Kalman
smoother, using Rosenburg-Klatt derivative glottal model and an all-pole vocal tract filter.
CLEAN INPUT
NOISY INPUT
- With
20dB white noise
- With
pre-emphasis : Use Hann-7 smoothing window. Bad vocal tract filter
estimates due to emphasized noise.
- Without
pre-emphasis : Use Hann-9 smoothing window. Stable vocal tract filter
estimates. No musical noise, but sounds muffled.
- Enhanced
sound : as
generated by concatenation of Kalman smoothed state estimates (with no
pre-emphasis). Musical noise exists.
- With
20dB pink noise :
Using 3rd order ARMA filter to generate pink noise
- With
pre-emphasis : Use Hann-9 smoothing window. Better sound than
having additive white noise input.
- Enhanced
sound :
as generated by concatenation of Kalman smoothed state estimates (with
pre-emphasis).