Synthesis (Step 7)

Synthesis (Step 7)

The analysis portion of PARSHL returns a set of amplitudes $\hat{A}^m$ , frequencies $\hat{\omega}^m$ , and phases $\hat{\theta}^m$ , for each frame index , with a ``triad'' ( $\hat{A}_r^m, \hat{\omega}_r^m, \hat{\theta}_r^m$ ) for each track . From this analysis data the program has the option of generating a synthetic sound.

The synthesis is done one frame at a time. The frame at hop , specifies the synthesis buffer

$\displaystyle s^m(n) = \sum_{r=1}^{R^m} \hat{A}_{r}^m \cos [n\hat{\omega}_{r}^m + \hat{\theta}_{r}^m]$

where

is the number of tracks present at frame

; $m=0,1,2, \ldots ,S-1$ ; and

is the length of the synthesis buffer (without any time scaling

, the analysis hop size). To avoid ``clicks'' at the frame boundaries, the parameters ( $\hat{A}_r^m, \hat{\omega}_r^m, \hat{\theta}_r^m$ ) are smoothly interpolated from frame to frame.

The parameter interpolation across time used in PARSHL is the same as that used by McAulay and Quatieri [12]. Let ( $\hat{A}_r^{(m-1)}, \hat{\omega}_r^{(m-1)}, \hat{\theta}_r^{(m-1)}$ ) and ( $\hat{A}_r^m, \hat{\omega}_r^m, \hat{\theta}_r^m$ ) denote the sets of parameters at frames and for the th frequency track. They are taken to represent the state of the signal at time 0 (the left endpoint) of the frame.

The instantaneous amplitude $\hat{A}(n)$ is easily obtained by linear interpolation,

$\displaystyle \hat{A}(n)= \hat{A}^{m-1} + \frac{(\hat{A}^m - \hat{A}^{m-1})}{S} n$

where $n= 0, 1, \ldots, S-1$ is the time sample into the

th frame.

Frequency and phase values are tied together (frequency is the phase derivative), and they both control the instantaneous phase $\hat{\theta}(n)$ . Given that four variables are affecting the instantaneous phase: $\hat{\omega}^{(m-1)}, \hat{\theta}^{(m-1)}, \hat{\omega}^m$ , and $\hat{\theta}^m$ , we need at least three degrees of freedom for its control, while linear interpolation only gives one. Therefore, we need at least a cubic polynomial as interpolation function, of the form

$\displaystyle \hat{\theta}(n) = \zeta + \gamma n + \alpha n^2 + \beta n^3.$

We will not go into the details of solving this equation since McAulay and Quatieri [12] go through every step. We will simply state the result:

$\displaystyle \hat{\theta}(n) = \hat{\theta}^{(m-1)} + \hat{\omega}^{(m-1)} n + \alpha n^2 + \beta n^3$

where $\alpha$ and $\beta$ can be calculated using the end conditions at the frame boundaries,

$\displaystyle \alpha$	$\displaystyle =$	$\displaystyle \frac{3}{S^2} {(\hat{\theta}^m - \hat{\theta}^{m-1} - \hat{\omega} ^{m-1} S + 2\pi M) - \frac{1}{S} (\hat{\omega}^m - \hat{\omega}^{m-1})}$	(15)
$\displaystyle \beta$	$\displaystyle =$	$\displaystyle \frac{-2}{S^3} {(\hat{\theta}^m - \hat{\theta}^{m-1} - \hat{\omega} ^{m-1} S + 2\pi M) + \frac{1}{S^2} (\hat{\omega}^m - \hat{\omega}^{m-1})}$	(16)

This will give a set of interpolating functions depending on the value of

, among which we have to select the ``maximally smooth'' one. This can be done by choosing

to be the integer closest to

, where

$\displaystyle x= \frac{1}{2\pi} \left[(\hat{\theta}^{m-1} - \hat{\omega}^{m-1} S - \hat{\theta}^m) + (\hat{\omega}^m - \hat{\omega}^{m+1}) \frac{S}{2}\right]$

and finally, the synthesis equation turns into

$\displaystyle s^m(n) = \sum_{r=1}^{R^m} \hat{A}_{r}^m(n) \cos [\hat{\theta}_{r}^m(n)]$

which smoothly goes from frame to frame and where each sinusoid accounts for both the rapid phase changes (frequency) and the slowly varying phase changes.

Figure 7 shows the result of the analysis/synthesis process using phase information and applied to a piano tone.

**Figure 7:** (a) Original piano tone, (b) synthesis with phase information, (c) synthesis without phase information.
$\includegraphics{eps/fig8.eps}$

Download parshl.pdf

``PARSHL: An Analysis/Synthesis Program for Non-Harmonic Sounds Based on a Sinusoidal Representation'', by Julius O. Smith III and Xavier Serra, Proceedings of the International Computer Music Conference (ICMC-87, Tokyo), Computer Music Association, 1987.
Copyright © 2005-12-28 by Julius O. Smith III and Xavier Serra
Center for Computer Research in Music and Acoustics (CCRMA), Stanford University
[Automatic-links disclaimer]