Formant Filtering Example

In *speech synthesis* [27,39],
digital filters are often used to simulate *formant filtering* by
the vocal tract. It is well known [23] that the different
*vowel sounds* of speech can be simulated by passing a ``buzz
source'' through only two or three formant filters. As a result,
speech is fully intelligible through the telephone bandwidth
(nominally only 200-3200 Hz).

A *formant* is a *resonance* in the voice spectrum. A
single formant may thus be modeled using one *biquad*
(second-order filter section). For example, in the vowel
as in
``father,'' the first three formant center-frequencies have been
measured near 700, 1220, and 2600 Hz, with half-power
bandwidths^{10.7} 130, 70, and 160 Hz [40].

In principle, the formant filter sections are in *series*, as can
be found by deriving the transfer function of an acoustic tube
[48]. As a consequence, the vocal-tract transfer function is an
all-pole filter (provided that the nasal tract is closed off or
negligible). As a result, there is no need to specify *gains*
for the formant resonators--only center-frequency and bandwidth are
necessary to specify each formant, leaving only an overall scale
factor unspecified in a cascade (series) formant filter bank.

Numerically, however, it makes more sense to implement disjoint
resonances in *parallel* rather than in series.^{10.8} This is because when
one formant filter is resonating, the others will be attenuating, so
that to achieve a particular peak-gain at resonance, the resonating
filter must overcome all combined attenuations as well as
applying its own gain. In fixed-point arithmetic, this can result in
large quantization-noise gains, especially for the last resonator in
the chain. As a result of these considerations, our example will
implement the formant sections in parallel. This means we must find
the appropriate biquad *numerators* so that when added together,
the overall transfer-function numerator is a constant. This will be
accomplished using the *partial fraction expansion*
(§6.8).^{10.9}

The matlab below illustrates the construction of a parallel formant filter bank for simulating the vowel . For completeness, it is used to filter a bandlimited impulse train, in order to synthesize the vowel sound.

F = [700, 1220, 2600]; % Formant frequencies (Hz) BW = [130, 70, 160]; % Formant bandwidths (Hz) fs = 8192; % Sampling rate (Hz) nsecs = length(F); R = exp(-pi*BW/fs); % Pole radii theta = 2*pi*F/fs; % Pole angles poles = R .* exp(j*theta); % Complex poles B = 1; A = real(poly([poles,conj(poles)])); % freqz(B,A); % View frequency response: % Convert to parallel complex one-poles (PFE): [r,p,f] = residuez(B,A); As = zeros(nsecs,3); Bs = zeros(nsecs,3); % complex-conjugate pairs are adjacent in r and p: for i=1:2:2*nsecs k = 1+(i-1)/2; Bs(k,:) = [r(i)+r(i+1), -(r(i)*p(i+1)+r(i+1)*p(i)), 0]; As(k,:) = [1, -(p(i)+p(i+1)), p(i)*p(i+1)]; end sos = [Bs,As]; % standard second-order-section form iperr = norm(imag(sos))/norm(sos); % make sure sos is ~real disp(sprintf('||imag(sos)||/||sos|| = %g',iperr)); % 1.6e-16 sos = real(sos) % and make it exactly real % Reconstruct original numerator and denominator as a check: [Bh,Ah] = psos2tf(sos); % parallel sos to transfer function % psos2tf appears in the matlab-utilities appendix disp(sprintf('||A-Ah|| = %g',norm(A-Ah))); % 5.77423e-15 % Bh has trailing epsilons, so we'll zero-pad B: disp(sprintf('||B-Bh|| = %g',... norm([B,zeros(1,length(Bh)-length(B))] - Bh))); % 1.25116e-15 % Plot overlay and sum of all three % resonator amplitude responses: nfft=512; H = zeros(nsecs+1,nfft); for i=1:nsecs [Hiw,w] = freqz(Bs(i,:),As(i,:)); H(1+i,:) = Hiw(:).'; end H(1,:) = sum(H(2:nsecs+1,:)); ttl = 'Amplitude Response'; xlab = 'Frequency (Hz)'; ylab = 'Magnitude (dB)'; sym = ''; lgnd = {'sum','sec 1','sec 2', 'sec 3'}; np=nfft/2; % Only plot for positive frequencies wp = w(1:np); Hp=H(:,1:np); figure(1); clf; myplot(wp,20*log10(abs(Hp)),sym,ttl,xlab,ylab,1,lgnd); disp('PAUSING'); pause; saveplot('../eps/lpcexovl.eps'); % Now synthesize the vowel [a]: nsamps = 256; f0 = 200; % Pitch in Hz w0T = 2*pi*f0/fs; % radians per sample nharm = floor((fs/2)/f0); % number of harmonics sig = zeros(1,nsamps); n = 0:(nsamps-1); % Synthesize bandlimited impulse train for i=1:nharm, sig = sig + cos(i*w0T*n); end; sig = sig/max(sig); speech = filter(1,A,sig); soundsc([sig,speech]); % hear buzz, then 'ah'

**Notes:**

- The sampling rate was chosen to be Hz because that is the default Matlab sampling rate, and because that is a typical value used for ``telephone quality'' speech synthesis.
- The
`psos2tf`utility is listed in §J.7. - The overlay of the amplitude responses are shown in Fig.9.6.

[How to cite this work] [Order a printed hardcopy] [Comment on this page via email]

Copyright ©

Center for Computer Research in Music and Acoustics (CCRMA), Stanford University