Abstract-algorithm synthesis seems destined to diminish in importance due to the lack of analysis support. As many algorithmic synthesis attempts showed us long ago, it is difficult to find a wide variety of musically pleasing sounds by exploring the parameters of a mathematical expression. Apart from a musical context that might impart some meaning, most sounds are simply uninteresting. The most straightforward way to obtain interesting sounds is to draw on past instrument technology or natural sounds. Both spectral-modeling and physical-modeling synthesis techniques can model such sounds. In both cases the model is determined by an analysis procedure that computes optimal model parameters to approximate a particular input sound. The musician manipulates the parameters to create musical variations.
Obtaining better control of sampling synthesis will require more general sound transformations. To proceed toward this goal, transformations must be understood in terms of what we hear. The best way we know to understand a sonic transformation is to study its effect on the short-time spectrum, where the spectrum-analysis parameters are tuned to match the characteristics of hearing as closely as possible. Thus, it appears inevitable that sampling synthesis will migrate toward spectral modeling. If abstract methods disappear and sampling synthesis is absorbed into spectral modeling, this leaves only two categories: physical-modeling and spectral-modeling. This boils all synthesis techniques down to those which model either the source or the receiver of the sound.
Some characteristics of each case are listed in the following table:
Spectral Modeling Physical Modeling Fully general Specialized case by case Any basilar membrane skyline Any instrument at some cost Time and frequency domains Time and space domains Numerous time-freq envelopes Several physical variables Memory requirements large More compact description Large operation-count/sample Small to large complexity Stochastic part initially easy Stochastic part usually tricky Attacks difficult Attacks natural Articulations difficult Articulations natural Expressivity limited Expressivity unlimited Nonlinearities difficult Nonlinearities natural Delay/reverb hard Delay/reverb natural Can calibrate to nature Can calibrate to nature Can calibrate to any sound May calibrate to own sound Physics not too helpful Physics very helpful Cartoons from pictures Working models from all clues Evolution restricted Evolution unbounded Represents sound receiver Represents sound source
Since spectral modeling constructs directly the spectrum received along the basilar membrane of the ear, its scope is inherently broader than that of physical modeling. However, physical models provide more compact algorithms for generating familiar classes of sounds, such as strings and woodwinds. Also, they are generally more efficient at producing effects in the spectrum arising from attack articulations, long delays, pulsed noise, or nonlinearity in the physical instrument. It is also interesting to pause and consider how invariably performing musicians have interacted with resonators since the dawn of time in music. When a resonator has an impulse-response duration greater than that of a spectral frame (nominally the ``integration time'' of the ear), as happens with any string, then implementation of the resonator directly in the short-time spectrum becomes inconvenient. A resonator is a much easier to implement as a recursion than as a super-thin formant in a short-time spectrum. Of course, as Orion Larson says1: ``Anything is possible in software.''
Spectral modeling has unsolved problems in the time domain: it is not yet known how to best modify a short-time Fourier analysis in the vicinity of an attack or other phase-sensitive transient. Phase is important during transients and not during steady-state intervals; a proper time-varying spectrum model should retain phase only where needed for accurate synthesis. The general question of timbre perception of non-stationary sounds becomes important. Wavelet transforms support more general signal building blocks that could conceivably help solve the transient modeling problem. Most activity with wavelet transforms to date has been confined to basic constant-Q spectrum analysis, where the analysis filters are aligned to a logarithmic frequency grid and have a constant ratio of bandwidth to center frequency or Q. Spectral models are also not yet sophisticated; sinusoids and filtered noise with piecewise-linear envelopes are a good start, but surely there are other good primitives. Finally, tools for spectral modeling and transformation, such as spectral envelope and formant estimators, peak-finders, pitch-detectors, polyphonic peak associators, time compression/expansion transforms, and so on, should be developed in a more general-purpose and sharable way.
The use of granular synthesis to create swarms of ``grains'' of sound using wavelet kernels of some kind (Roads 1989: Roads 1978) appears promising as a basis for a future statistical time-domain modeling technique. It would be very interesting if a kind of wavelet transform could be developed that would determine the optimum grain waveform, and provide the counterpart of a short-time power spectral density that would indicate the statistical frequency of each grain scale at a given time. Such a tool could provide a compact, transformable description of sounds such as explosions, rain, breaking glass, and the crushing of rocks, to name a few.