Projections for the Future

Projections for the Future

Abstract-algorithm synthesis seems destined to diminish in importance due to the lack of analysis support. As many algorithmic synthesis attempts showed us long ago, it is difficult to find a wide variety of musically pleasing sounds by exploring the parameters of a mathematical expression. Apart from a musical context that might impart some meaning, most sounds are simply uninteresting. The most straightforward way to obtain interesting sounds is to draw on past instrument technology or natural sounds. Both spectral-modeling and physical-modeling synthesis techniques can model such sounds. In both cases the model is determined by an analysis procedure that computes optimal model parameters to approximate a particular input sound. The musician manipulates the parameters to create musical variations.

Obtaining better control of sampling synthesis will require more general sound transformations. To proceed toward this goal, transformations must be understood in terms of what we hear. The best way we know to understand a sonic transformation is to study its effect on the short-time spectrum, where the spectrum-analysis parameters are tuned to match the characteristics of hearing as closely as possible. Thus, it appears inevitable that sampling synthesis will migrate toward spectral modeling. If abstract methods disappear and sampling synthesis is absorbed into spectral modeling, this leaves only two categories: physical-modeling and spectral-modeling. This boils all synthesis techniques down to those which model either the source or the receiver of the sound.

Some characteristics of each case are listed in the following table:

Spectral Modeling Physical Modeling

Fully general Specialized case by case

Any basilar membrane skyline Any instrument at some cost

Time and frequency domains Time and space domains

Numerous time-freq envelopes Several physical variables

Memory requirements large More compact description

Large operation-count/sample Small to large complexity

Stochastic part initially easy Stochastic part usually tricky

Attacks difficult Attacks natural

Articulations difficult Articulations natural

Expressivity limited Expressivity unlimited

Nonlinearities difficult Nonlinearities natural

Delay/reverb hard Delay/reverb natural

Can calibrate to nature Can calibrate to nature

Can calibrate to any sound May calibrate to own sound

Physics not too helpful Physics very helpful

Cartoons from pictures Working models from all clues

Evolution restricted Evolution unbounded

Represents sound receiver Represents sound source

Since spectral modeling constructs directly the spectrum received along the basilar membrane of the ear, its scope is inherently broader than that of physical modeling. However, physical models provide more compact algorithms for generating familiar classes of sounds, such as strings and woodwinds. Also, they are generally more efficient at producing effects in the spectrum arising from attack articulations, long delays, pulsed noise, or nonlinearity in the physical instrument. It is also interesting to pause and consider how invariably performing musicians have interacted with resonators since the dawn of time in music. When a resonator has an impulse-response duration greater than that of a spectral frame (nominally the ``integration time'' of the ear), as happens with any string, then implementation of the resonator directly in the short-time spectrum becomes inconvenient. A resonator is a much easier to implement as a recursion than as a super-thin formant in a short-time spectrum. Of course, as Orion Larson says¹: ``Anything is possible in software.''

Spectral modeling has unsolved problems in the time domain: it is not yet known how to best modify a short-time Fourier analysis in the vicinity of an attack or other phase-sensitive transient. Phase is important during transients and not during steady-state intervals; a proper time-varying spectrum model should retain phase only where needed for accurate synthesis. The general question of timbre perception of non-stationary sounds becomes important. Wavelet transforms support more general signal building blocks that could conceivably help solve the transient modeling problem. Most activity with wavelet transforms to date has been confined to basic constant-Q spectrum analysis, where the analysis filters are aligned to a logarithmic frequency grid and have a constant ratio of bandwidth to center frequency or Q. Spectral models are also not yet sophisticated; sinusoids and filtered noise with piecewise-linear envelopes are a good start, but surely there are other good primitives. Finally, tools for spectral modeling and transformation, such as spectral envelope and formant estimators, peak-finders, pitch-detectors, polyphonic peak associators, time compression/expansion transforms, and so on, should be developed in a more general-purpose and sharable way.

The use of granular synthesis to create swarms of ``grains'' of sound using wavelet kernels of some kind (Roads 1989: Roads 1978) appears promising as a basis for a future statistical time-domain modeling technique. It would be very interesting if a kind of wavelet transform could be developed that would determine the optimum grain waveform, and provide the counterpart of a short-time power spectral density that would indicate the statistical frequency of each grain scale at a given time. Such a tool could provide a compact, transformable description of sounds such as explosions, rain, breaking glass, and the crushing of rocks, to name a few.

Download kna.pdf

``Viewpoints on the History of Digital Synthesis'', by Julius O. Smith III, Proceedings of the International Computer Music Conference (ICMC-91, Montreal), pp. 1-10,
Computer Music Association, October 1991.
Revised with Curtis Roads for publication in Cahiers de l'IRCAM, September 1992, Institut de Recherche et Coordination Acoustique / Musique.
Copyright © 2005-12-28 by Julius O. Smith III
Center for Computer Research in Music and Acoustics (CCRMA), Stanford University
[Automatic-links disclaimer]

Spectral Modeling	Physical Modeling
Fully general	Specialized case by case
Any basilar membrane skyline	Any instrument at some cost
Time and frequency domains	Time and space domains
Numerous time-freq envelopes	Several physical variables
Memory requirements large	More compact description
Large operation-count/sample	Small to large complexity
Stochastic part initially easy	Stochastic part usually tricky
Attacks difficult	Attacks natural
Articulations difficult	Articulations natural
Expressivity limited	Expressivity unlimited
Nonlinearities difficult	Nonlinearities natural
Delay/reverb hard	Delay/reverb natural
Can calibrate to nature	Can calibrate to nature
Can calibrate to any sound	May calibrate to own sound
Physics not too helpful	Physics very helpful
Cartoons from pictures	Working models from all clues
Evolution restricted	Evolution unbounded
Represents sound receiver	Represents sound source