The first major effort to encode speech electronically was Homer Dudley's vocoder (``voice coder'') [119] developed starting in October of 1928 at AT&T Bell Laboratories [417]. A manually controlled version of the vocoder synthesis engine, called the Voder (Voice Operation Demonstrator [141]), was constructed and demonstrated at the 1939 World's Fairs in New York and San Francisco [119]. Pitch was controlled by a foot pedal, and ten fingers controlled the bandpass gains. Buzz/hiss selection was by means of a wrist bar. Three additional keys controlled transient excitation of selected filters to achieve stop-consonant sounds [141]. ``Performing speech'' on the Voder required on the order of a year's training before intelligible speech could reliably be produced. The Voder was a very interesting performable instrument!
The vocoder and Voder can be considered based on a source-filter model for speech which includes a non-parametric spectral model of the vocal tract given by the output of a fixed bandpass-filter-bank over time. Later efforts included the formant vocoder (Munson and Montgomery 1950)--a type of parametric spectral model--which encoded and the amplitude and center-frequency of the first three spectral formants. See [309, p. 2452-3] for an overview and references.
While we have now digressed to into the realm of spectral models, as opposed to physical models, it seems worthwhile to point out that the early efforts toward speech synthesis were involved with essentially all of the mainstream sound modeling methods in use today (both spectral and physical domains).