Next  |  Prev  |  Top  |  JOS Index  |  JOS Pubs  |  JOS Home  |  Search


Synthesis (Step 7)

The analysis portion of PARSHL returns a set of amplitudes $ \hat{A}^m$, frequencies $ \hat{\omega}^m$, and phases $ \hat{\theta}^m$, for each frame index $ m$, with a ``triad'' ( $ \hat{A}_r^m, \hat{\omega}_r^m,
\hat{\theta}_r^m$) for each track $ r$. From this analysis data the program has the option of generating a synthetic sound.

The synthesis is done one frame at a time. The frame at hop $ m$, specifies the synthesis buffer

$\displaystyle s^m(n) = \sum_{r=1}^{R^m} \hat{A}_{r}^m \cos [n\hat{\omega}_{r}^m +
\hat{\theta}_{r}^m]
$

where $ R^m$ is the number of tracks present at frame $ m$; $ m=0,1,2,
\ldots ,S-1$; and $ S$ is the length of the synthesis buffer (without any time scaling $ S=R$, the analysis hop size). To avoid ``clicks'' at the frame boundaries, the parameters ( $ \hat{A}_r^m, \hat{\omega}_r^m,
\hat{\theta}_r^m$) are smoothly interpolated from frame to frame.

The parameter interpolation across time used in PARSHL is the same as that used by McAulay and Quatieri [12]. Let ( $ \hat{A}_r^{(m-1)}, \hat{\omega}_r^{(m-1)}, \hat{\theta}_r^{(m-1)}$) and ( $ \hat{A}_r^m, \hat{\omega}_r^m,
\hat{\theta}_r^m$) denote the sets of parameters at frames $ m-1$ and $ m$ for the $ r$th frequency track. They are taken to represent the state of the signal at time 0 (the left endpoint) of the frame.

The instantaneous amplitude $ \hat{A}(n)$ is easily obtained by linear interpolation,

$\displaystyle \hat{A}(n)= \hat{A}^{m-1} + \frac{(\hat{A}^m - \hat{A}^{m-1})}{S} n
$

where $ n= 0, 1, \ldots, S-1$ is the time sample into the $ m$th frame.

Frequency and phase values are tied together (frequency is the phase derivative), and they both control the instantaneous phase $ \hat{\theta}(n)$. Given that four variables are affecting the instantaneous phase: $ \hat{\omega}^{(m-1)}, \hat{\theta}^{(m-1)},
\hat{\omega}^m$, and $ \hat{\theta}^m$, we need at least three degrees of freedom for its control, while linear interpolation only gives one. Therefore, we need at least a cubic polynomial as interpolation function, of the form

$\displaystyle \hat{\theta}(n) = \zeta + \gamma n + \alpha n^2 + \beta n^3.
$

We will not go into the details of solving this equation since McAulay and Quatieri [12] go through every step. We will simply state the result:

$\displaystyle \hat{\theta}(n) = \hat{\theta}^{(m-1)} + \hat{\omega}^{(m-1)} n +
\alpha n^2 + \beta n^3
$

where $ \alpha$ and $ \beta$ can be calculated using the end conditions at the frame boundaries,
$\displaystyle \alpha$ $\displaystyle =$ $\displaystyle \frac{3}{S^2} {(\hat{\theta}^m - \hat{\theta}^{m-1} - \hat{\omega}
^{m-1} S + 2\pi M) - \frac{1}{S} (\hat{\omega}^m - \hat{\omega}^{m-1})}$ (15)
$\displaystyle \beta$ $\displaystyle =$ $\displaystyle \frac{-2}{S^3} {(\hat{\theta}^m - \hat{\theta}^{m-1} - \hat{\omega}
^{m-1} S + 2\pi M) + \frac{1}{S^2} (\hat{\omega}^m - \hat{\omega}^{m-1})}$ (16)

This will give a set of interpolating functions depending on the value of $ M$, among which we have to select the ``maximally smooth'' one. This can be done by choosing $ M$ to be the integer closest to $ x$, where $ x$ is

$\displaystyle x= \frac{1}{2\pi} \left[(\hat{\theta}^{m-1} - \hat{\omega}^{m-1} S -
\hat{\theta}^m) + (\hat{\omega}^m - \hat{\omega}^{m+1}) \frac{S}{2}\right]
$

and finally, the synthesis equation turns into

$\displaystyle s^m(n) = \sum_{r=1}^{R^m} \hat{A}_{r}^m(n) \cos [\hat{\theta}_{r}^m(n)]
$

which smoothly goes from frame to frame and where each sinusoid accounts for both the rapid phase changes (frequency) and the slowly varying phase changes.

Figure 7 shows the result of the analysis/synthesis process using phase information and applied to a piano tone.

Figure 7: (a) Original piano tone, (b) synthesis with phase information, (c) synthesis without phase information.
\includegraphics{eps/fig8.eps}


Next  |  Prev  |  Top  |  JOS Index  |  JOS Pubs  |  JOS Home  |  Search

Download parshl.pdf

``PARSHL: An Analysis/Synthesis Program for Non-Harmonic Sounds Based on a Sinusoidal Representation'', by Julius O. Smith III and Xavier Serra, Proceedings of the International Computer Music Conference (ICMC-87, Tokyo), Computer Music Association, 1987.
Copyright © 2005-12-28 by Julius O. Smith III and Xavier Serra
Center for Computer Research in Music and Acoustics (CCRMA),   Stanford University
CCRMA  [Automatic-links disclaimer]