An historical summary of voice modeling and synthesis appears in §A.6.3.
In [292], a model is proposed for the singing voice in which the driving glottal pulse train is estimated jointly with filter parameters describing the shape of the vocal tract (the complete airway from the base of the throat to the lip opening). The model can be seen as an improvement over linear-predictive coding (LPC) of voice in the direction of a more accurate physical model of voice production, while maintaining a low computational cost relative to more complex articulatory models of voice production. In particular, the parameter estimation involves only convex optimization plus a one-dimensional (possibly non-convex) line search over a compact interval. The line search determines the so-called ``open quotient'' which is the fraction of time that there is glottal flow within each period. The glottal pulse parameters are based on the derivative-glottal-wave models of Liljencrants, Fant, and Klatt [134,259]. Portions of this research have been published in the ICMC-00 [293] and WASPAA-01 [294] proceedings. Related subsequent work includes [252,214,253,215,213]
Earlier work in voice synthesis, some summarized in Appendix A, includes [40,81,87,90,207,259,392,495]; see also the KTH ``Research Topics'' home page.