Glottal source modeling for singing voice synthesis

Aug 24, 2000



Naturalness of sound quality is essential for singing-voice synthesis. Since 95% of singing is voiced sound (Cook, 1990), the focus of this paper is to improve the naturalness of the vowel tone quality via glottal excitation modeling. We propose to use the LF-model (Fant et al., 1985) for the glottal wave shape in conjunction with pitch-synchronous, amplitude-modulated Gaussian noise, which adds an aspiration component to the glottal excitation. The associated analysis and synthesis procedures are also provided in this paper. By analyzing baritone recordings, we have found simple rules to change voice qualities from “laryngealized” (or “pressed”), to normal, to “breathy” phonation.

You can download my ICMC2000 paper for details.

Sound Examples

You can control the vocal texture by a single parameter! I have implemented a simple demo application. Its interface looks like this:

Here are some sound examples generated from this application.

        Vowel /a/, Low Pitch, Press phonation (VTC = 0.1)

        Vowel /a/, Low Pitch, Normal phonation (VTC = 0.5)

        Vowel /a/, Low Pitch, Breathy phonation (VTC = 1)

