CCRMA
next up previous contents
Next: Controllers for Computers and Musical Instruments Up: Research Activities Previous: Computer Music Hardware and Software

Physical Modeling and Digital Signal Processing




Synthesis of Transients in Classical Guitar Sounds

Cem Duruoz

Synthesis of acoustic musical instrument sounds using computers has been a fundamental problem in acoustics. It is well known that, the transients heard right before, during and right after the attack portion of an instrumental sound are the elements which give the instrument most of its individual character. Therefore, in a synthesis model, it is crucial to implement them carefully, in order to obtain sounds similar to those produced by acoustic instruments. The transients in classical guitar sounds were studied by making studio recordings, digital editing and Fourier Analysis. The sounds heard in the vicinity of the attack were classified according to the origin, spectral content and duration. Next, a hybrid FM/Physical Modeling Synthesis model was developed to produce these transients sequentially. The parameters such as the duration, amplitude and pitch were extracted from further recordings, and incorporated into the model to synthesize realistic classical guitar sounds.

Estimation of Multiple Fundamental Frequencies in Audio Signals Using a Genetic Algorithm

Guillermo Garcia
guille@ccrma.stanford.edu

A method for estimating multiple, simultaneous fundamental frequencies in a polyphonic audio spectrum is presented. The method takes advantage of the power of genetic algorithms to explore a large search space, and to find a globally optimal combination of fundamental frequencies that best models the polyphonic signal spectrum. A genetic algorithm with variable chromosome length, a special crossover operator and other features is proposed. No a-priori knowledge about the number of fundamental frequencies present in the spectrum is assumed. Assessment of the first version of this method has shown correct detection (in number and value) of up to five fundamental frequencies. Planned refinements on the genetic algorithm operators could enhance this performance.

Computation of Reflection Coefficients for an Axisymmetrical Horn by Boundary Element Method

Shyh-Kang Jeng

It seems that there is no literature about using the boundary-element method (BEM) to deal with a music horn, though some authors have applied the BEM to horn loud speaker problems (for examples, Kristiansen, etc. [1], Henwood [2], and Johansen [3]). The BEM approach starts from the Helmholtz equation of linear acoustics, and makes no approximation except those required for numerical calculation. Therefore, it is expected to include the effect of diffraction from edges and the contribution of higher order modes.

In this research, an integral equation is first derived. Special care has to be taken for the singularities. The formulation will take advantage of the axisymmetry, and will express the pressure field inside the cylindrical section as a summation of modal fields. The boundary-element method is then applied to approximate the integral equation by a matrix one. By solving the matrix equation, we may obtain the reflection coefficient directly. Next, the reflection coefficients for a sequence of sampled frequencies in the desired frequency band are computed and an inverse Fourier transform is performed to obtain the impulse response of an equivalent filter. Finally, an approximate FIR or IIR filter is deduced from the equivalent filter, and a physical model of a brass can be obtained by connecting the approximate filter to a digital waveguide system.

With simple extensions, this approach can be used to model bores and openings of wind instruments.

References

1
U. R. Kristiansen and T. F. Johansen, ``The horn loudspeaker as a screen-diffraction problem,'' Journal of Sound and Vibration, 133(3), pp. 449-456, 1989.

2
D. J. Henwood, ``The boundary-element method and horn design,'' Journal of Audio Engineering Society, Vol. 41, No. 6, pp. 485-496, 1993.

3
T. F. Johansen, ``On the directivity of horn loudspeakers,'' Journal of Audio Engineering Society, Vol. 42, No. 12, pp. 1008-1019, 1994.

Synthesis of Ecologically-Based Sound Events

Damián Keller and Chris Rolfe

We present techniques for the efficient synthesis of everyday sounds, that is, sounds like rain, fire, breaking glass, scraping and bouncing objects, etc. These sounds present dynamic temporal and spectral states that cannot be described by either deterministic or stochastic models alone (Cook, 1997; Roads, 1997). We propose a conceptually simple method for resynthesizing decorrelated, unique sound events using constrained parametric control of stochastic processes.

Granular synthesis has proven to be a convenient and efficient method for stochastic, time-based synthesis (Truax, 1988). To better control spectral details, we extend asynchronous granular synthesis to include phase-correlation, time-dependent overlap, amplitude scaling, and synchronicity between granular streams. We propose a representation of ecologically-based sound events comprising three control levels: micro, meso, and macro. By having a control structure across all three time resolutions we can better manage time-frequency boundary phenomena, thus taking into account windowing and overlap effects, spectral evolutions, and emergent perceptual properties.

Related Article

Modeling the Sound Event at the Micro Level

Damián Keller, Jonathan Berger and Conrado Silva

Environmental sounds present a difficult problem for sound modeling because spectral and temporal cues are tightly correlated. These cues interact to produce sound events with complex dynamics. In turn, these complex sounds form large classes which can be defined by statistical measurements. Thus, environmental sounds cannot be handled by traditional deterministic synthesis methods. The objective of this project is to implement algorithmic tools which allow to define sound events by multilevel parameter manipulation.

Micro-level representations of sounds provide a way to control spectral and spatial cues in sound synthesis. Meso-level representations determine the temporal structure of sound events. By integrating these approaches into a coherent data structure we expect to be able to model sound events with complex dynamic evolutions both at a micro and at a meso level. Consequently, these tools will extend the parameter space of ecological models to include spectral and spatial cues.

Toward a High-Quality Singing Synthesizer

Hui-Ling Lu

Naturalness of the sound quality is essential for the singing synthesis. Since 95% in singing is voiced sound, the focus of this research is to improve the naturalness of the vowel tone quality. In this study, we only focus on the non-nasal voiced sound. To trade off between the complexity of the modeling and the analysis procedure to acquire the model parameters, we propose to use the source-filter type synthesis model, based on a simplified human voice production system. The source-filter model decomposes the human voice production system into three linear systems: glottal source, vocal tract and radiation. The radiation is simplified as a differencing filter. The vocal tract filter is assumed all-poled for non-nasal sound. The glottal source and the radiation are then combined as the derivative glottal wave. We shall call it as the glottal excitation.

The effort is then to estimate the vocal tract filter parameter and glottal excitation to mimic the desired singing vowels. The de-convolution of the vocal tract filter and glottal excitation was developed via the convex optimization technique [1]. Through this de-convolution, one could obtain the vocal tract filter parameters and the glottal excitation waveform.

Since the glottal source modeling has shown to be an important factor for improving the naturalness of the speech synthesis. We are investigating the glottal source modeling alternatives for singing voice. Besides the abilities of flexible pitch and volume control, the desired source model is expected to be capable of controlling the voice quality. The voice quality is restricted to the voice source modification ranging from laryngealized (pressed) to normal to breathy phonation. The evaluation will be based on the flexibility of the control and the ability to mimic the original sound recording of sustained vowels.

Reference

Scanned Synthesis, a new synthesis technique

Max V. Mathews, Bill Verplank, and Rob Shaw

Developed at the Interval Research Corporation in 1998 and 1999, Scanned Synthesis is a new technique for the synthesis of musical sounds. We believe it will become as important as existing methods such as wave table synthesis, additive synthesis, FM synthesis, and physical modeling. Scanned Synthesis is based on the psychoacoustics of how we hear and appreciate timbres and on our motor control (haptic) abilities to manipulate timbres during live performance. A unique feature of scanned synthesis is its emphasis on the performer's control of timbre.

Scanned synthesis involves a slow dynamic system whose frequencies of vibration are below about 15 hz. The system is directly manipulated by motions of the performer. The vibrations of the system are a function of the initial conditions, the forces applied by the performer, and the dynamics of the system. Examples include slowly vibrating strings, two dimensional surfaces obeying the wave equation, and a waterbed. We have simulated the string and surface models on a computer. Our waterbed model is purely conceptual.

The ear cannot hear the low frequencies of the dynamic system. To make audible frequencies, the ``shape'' of the dynamic system, along a closed path, is scanned periodically. The ``shape'' is converted to a sound wave whose pitch is determined by the speed of the scanning function. Pitch control is completely separate from the dynamic system control. Thus timbre and pitch are independent. This system can be looked upon as a dynamic wave table controlled by the performer.

The psychophysical basis for Scanned Synthesis comes from our knowledge about human auditory perception and human motor control abilities. In the 1960's Risset showed that the spectra of interesting timbres must change with time. We observe that musically interesting change rates are less than about 15 hz which is also the rate humans can move their bodies. We have named these rates Haptic rates.

We have studied Scanned Synthesis chiefly with a finite element model of a generalized string. Cadoz showed the musical importance of finite element models in the 1970s. Our models differ from Cadoz's in our focus on slow (haptic) vibration frequencies. Our finite element models are a collection of masses connected by springs and dampers. They can be analyzed with Newton's laws. We have generalized a traditional string by adding dampers and springs to each mass. All parameters-mass, damping, earth spring strength and string tension-can vary along the string. The performer manipulates the model by pushing or hitting different masses and by manipulating parameters.

We have already synthesized rich and interesting timbres and we have barely started to explore the range of possibilities in our present models. Many other different models can be conceived. We find the prospects exciting.

Perceptual Audio Coding Based on the Sinusoidal Transform

Juan Pampin and Guillermo Garcia

In this work, we have explored the possibilities of the sinusoidal model as a frequency-domain representation for perceptual audio coding of various types of audio signals. We have designed a set of techniques for data rate reduction and developed a codec software prototype consisting of three basic blocks:

  1. Partial pruning based upon psychoacoustics masking.
  2. Smart sinusoidal frame decimation based upon transient detection.
  3. Bit allocation based upon psychoacoustics masking, and quantization.

We have evaluated the codec on monophonic musical instruments (harmonic and inharmonic), polyphonic orchestral music, singing voice and speech. Results have been quite satisfying and have shown that the sinusoidal model can be used to achieve interesting compression factors at high quality, for a wide variety of audio signals. In particular, we believe this work shows that the sinusoidal model is not at all limited to monophonic, harmonic signals, when high quality audio compression is the goal.

Sig++: Musical Signal Processing in the C++ language

Craig Stuart Sapp

Sig++ is a set of C++ classes intended for use in writing sound generating/filtering programs by direct coding of flowgraphs schematics for signal processing filters as well as for traditional computer-music unit-generator flowgraphs. The paradigm for generating sound is similar to other Music V-style synthesis programs, such as Csound.

An intent of sig++ is to have very portable code. As a result, example programs using the sig++ library have been compiled on several computer configurations: Linux, Windows 95/NT, OpenStep, NextStep, Sun SPARCStations, HP-UX, and SGI IRIX.

See the main webpage for sig++ at http://www-ccrma.stanford.edu/~craig/sig which contains an overview, example binaries and sources, example sounds created by the example programs, documentation for the classes included in the sig++ library, as well as the source code for those classes.

Future additions to sig++ will be real-time sound input/output in Windows 95/NT and Linux as well as linking control of sound generation to MIDI using Improv.

Acoustic Research and Synthesis Models of Woodwind Instruments

Gary P. Scavone

The modeling of musical instruments using digital waveguide techniques has proven to be both an accurate and efficient technique for synthesis. Because such models are based on physical descriptions, they further provide a useful tool for acoustical explorations and research. Models of wind instrument air columns have reached a high level of development. An accurate and efficient means for modeling woodwind toneholes was described in [Scavone and Cook, 1998].

Recent work focused on modeling the direction-dependent sound radiation from woodwind and brass instruments [Scavone, 1999]. The current acoustic theory regarding sound radiation from ducts and holes can be implemented in the digital waveguide context using properly designed digital filters. Each radiating sound source or hole requires a first- or second-order digital filter to account for angular- and frequency-dependent pressure distribution characteristics. Sound propagation delay from the source to the pickup is modeled with a delay line and possibly a fractional-delay interpolation filter. An additional digital filter to model attenuation in free space can also be used. The results of this model compare well with frequency-domain polar radiation calculations and measurements performed by Antoine Rousseau and René Caussé (1996) at the Institut de Recherche et Coordination Acoustique/Musique (IRCAM). A simplified system appropriate for real-time synthesis was developed using The Synthesis ToolKit (STK) that allows continuous pickup movement within an anechoic 3D space.

Current efforts are being directed toward the development of improved models of woodwind instrument excitation mechanisms. This work is being performed in conjunction with the Categorical Perception of Sound Sources project, which is described in the Psychoacoustics and Cognitive Psychology research section of this document.

References

Physical Modeling of Bowed Strings: Analysis, Real-time Synthesis and Playability

Stefania Serafin and Julius Smith

Recent work on the field of bowed string has produced a real-time bowed string instrument which, despite its simplicity, is able to reproduce most of the phenomena that appear in real instruments. Our current research consists of improving this model, including also refinements made possible by the improvement of hardware technology and the development of efficient digital signal processing algorithms. In particular, we are modeling string stiffness whose main effect is to disperse the sharp corners that characterize the ideal Helmholtz motion. This dispersion is modeled using allpass filters whose coefficients are obtained by minimizing the L-infinity norm of the error between the internal loop phase and its approximation by this filter cascade.

We are also analyzing the "playability" of the model built, examining in which zones of a multidimensional parameter spaces "good tone" is produced. This study focuses on the influence of the torsional waves and on the shape of the friction curve. The aim is to analyze which elements of bowed string instruments are fundamental for bowed string synthesizer, and which can be neglected, to reduce computational cost. This work is extended by examining the attack portion of a virtual bowed string. Since a player's aim is normally to reach Helmholtz motion as quickly as possible, we analyze the attack parameters in order to determine the parameter combination that allows the oscillating string to achieve Helmholtz motion in the shortest time.

This research is part of the work done by the Strad (Sistema Tempo Reale Archetto Digitale), group made by CCRMA people working on different aspects of bowed string instruments.

The waveguide physical model we have built runs in real-time under different sound synthesis platforms e.g. Max/MSP, the Synthesis Toolkit, Common Lisp Music.

Another part of this research group consists in building controllers to allow composers and performers to play the model. In particular, we are interested in controllers that incorporate force feedback because they allow us to couple the sound and feel of a given bowing gesture. We are currently constructing an actuated bow and running experiments to discover the role played by tactile and kinesthetic feedback in stringed instrument playing.

FFT-Based DSP and Spectral Modeling Synthesis

Julius Smith

The Fast Fourier Transform (FFT) revolutionized signal processing practice in the 1960s. Today, it continues to spread as a practical basis for digital systems implementation. Only very recently has it become cost-effective to use the short-time FFT in real-time digital audio systems, thanks to the availability of sufficiently powerful, low-cost, single-chip solutions.

In the digital audio field, FFT-based techniques are ripe to appear in digital mixing consoles, post-production editing facilities, and top-quality digital audio gear. Many music and digital audio ``effects'' can be conveniently implemented in a unified way using a short-time Fourier analysis, modification, and resynthesis facility.

In the music synthesis field, obtaining better control of sampling synthesis will require more general sound transformations. To proceed toward this goal, transformations must be understood in terms of what we hear. The best way we know to understand a sonic transformation is to study its effect on the short-time spectrum, where the spectrum-analysis parameters are tuned to match the characteristics of hearing as closely as possible. Thus, it appears inevitable that sampling synthesis will migrate toward spectral modeling. Recent developments in constant-Q filterbanks, such as in the wavelet literature, have created new alternatives for consideration. Advanced time-frequency representations, such as the Wigner Distribution, are yielding new insights into time-varying audio spectra.

In contrast with physical modeling synthesis which models the source of a sound, spectral modeling techniques model sound at the receiver, the human ear. Spectral modeling is more immediately general than physical modeling since it is capable of constructing an arbitrary stimulus along the basilar membrane of the ear, while new physical models must be developed for each new class of musical instrument. While complex coarticulation effects are more naturally provided by physical models, the short-time Fourier transform can be applied to any sound demonstrating any desired effect to determine what must happen in a spectral sequence to produce that effect.

FFT-based techniques play an important role in (1) the practical implementation of general signal processing systems (fast convolution), (2) advanced effects such as ``cross synthesis,'' time compression/expansion, duration-invariant frequency shifting, and other ``phase vocoder'' type techniques, and (3) novel synthesis systems based on the direct creation and transformation of spectral events and envelopes.

References

Digital Waveguide Modeling of Acoustic Systems

Julius Smith

Digital Waveguide Filters (DWF) have proven useful for building computational models of acoustic systems which are both physically meaningful and efficient for applications such as digital synthesis. The physical interpretation opens the way to capturing valued aspects of real instruments which have been difficult to obtain by more abstract synthesis techniques. Waveguide filters were initially derived to construct digital reverberators out of energy-preserving building blocks, but any linear acoustic system can be approximated using waveguide networks. For example, the bore of a wind instrument can be modeled very inexpensively as a digital waveguide. Similarly, a violin string can be modeled as a digital waveguide with a nonlinear coupling to the bow. When the computational form is physically meaningful, it is often obvious how to introduce nonlinearities correctly, thus leading to realistic behaviors far beyond the reach of purely analytical methods.

In this context, a waveguide can be defined as any medium in which wave motion can be characterized by the one-dimensional wave equation. In the lossless case, all solutions can be expressed in terms of left-going and right-going traveling waves in the medium. The traveling waves propagate unchanged as long as the wave impedance of the medium is constant. At changes in the wave impedance, a traveling wave partial transmits and partially reflects in an energy conserving manner, a process known as ``scattering.'' The wave impedance is the square root of the ``massiness'' times the ``stiffness'' of the medium; that is, it is the geometric mean of the two sources of resistance to motion: the inertial resistance of the medium due to its mass, and the spring-force on the displaced medium due to its elasticity.

Digital waveguide filters are obtained (conceptually) by sampling the unidirectional traveling waves which occur in a system of ideal, lossless waveguides. Sampling is across time and space. Thus, variables in a DWF structure are equal exactly (at the sampling times and positions, to within numerical precision) to variables propagating in the corresponding physical system. Signal power is defined instantaneously with respect to time and space (just square and sum the wave variables.) This instantaneous handle on signal power yields a simple picture of the effects of round-off error on the growth or decay of the signal energy within the DWF system. Because waveguide filters can be specialized to well studied lattice/ladder digital filters, it is straightforward to realize any digital filter transfer function as a DWF. Waveguide filters are also related to ``wave digital filters'' (WDF) which have been developed primarily by Fettweis. Using a ``mesh'' of one-dimensional waveguides, modeling can be carried out in two and higher dimensions. In other applications, the propagation in the waveguide is extended to include frequency dependent losses and dispersion. In still more advanced applications, nonlinear effects are introduced as a function of instantaneous signal level.

Digital waveguide filters can be viewed as an efficient discrete-time ``building material'' for acoustic models incorporating aspects of one-dimensional waveguide acoustics, lattice and ladder digital filters, wave digital filters, and classical network theory.

References

Signal Processing Algorithm Design Stressing Efficiency and Simplicity of Control

Timothy Stilson

This project deals with the design of digital filters, oscillators, and other structures that have parameters that can be varied efficiently and intuitively. The main criteria for the algorithms are:

Often, one decides that a certain amount of inefficiency is livable, and in cases where a parameter changes only rarely, large amounts of inefficiency can be tolerated. But when a parameter must change very often, such as in a smooth sweep or a modulation, inefficiency is intolerable.

In this project, the main application is the field referred to as ``Virtual Analog Synthesis'', which tries to implement analog synthesis algorithms (in particular, subtractive synthesis) in digital systems. Characteristics of many analog patches were the blurring of the distinction between control signals and audio signals, such as in modulation schemes, or the ability to dynamically (smoothly) control any parameter. Both of these abilities require parameters to change at very high rates, even as fast as the sampling rate. Thus the necessity for efficiently controllable algorithms.

Two subprojects within this project are currently under being researched. First: the design and implementation of an efficient signal generator which generates bandlimited pulse trains, square waves, and sawtooth waves. The algorithm is being designed for basic efficiency, along with considerations for efficient variation of the main parameters: frequency and duty cycle.

Secondly, the connections between control-system theory and filter theory are being explored. One particular avenue of research is the application of Root-Locus design techniques to audio filter design. Root Locus explores the movement of system (filter) poles as a single parameter changes. Certain patterns in root loci appear repeatedly, and can be used in audio filter design to get various effects. A good example is the Moog VCF, which uses one of the most basic patterns in root-locus analysis to generate a filter that has trivial controls for both corner frequency and Q. Several other families of sweepable digital filters based on root-locus have already been found. A particular goal is to find a filter family that efficiently implements constant-Q sweepable digital filters (a problem that, it turns out, is particularly simple in continuous time -- the Moog VCF -- but is quite difficult in discrete-time).

Synthesis and Algorithmic Composition Techniques Derived from the Sonification of Particle Systems; And a Resultant Meta-Philosophy for Music and Physics

Bob L. Sturm

de Broglie's hypothesis from Quantum Mechanics (QM) states a particle can behave as either a particle or a wave. Thus a system of particles could become a complex superposition of dynamic waves. Motivated by this the author develops a method for sonification of particle systems in a logical manner. Thinking of sound in terms of an evolving system of particles, potentials, and initial conditions, a unique position is gained. A direct correspondence between sound composition and many-body physics allows ideas from each field to enrich the other, such as using sound to gain a higher comprehension of a phenomenon, or using radioactivity as a compositional device. One application so far explored has been algorithmic composition using a simulated particle system. It has been readily observed that the composer must also become physicist to make effective musical use of these techniques. Paradoxically, the audience need not be versed in physics to visualize and appreciate what they hear-a sign of a successful analogue. But by the very act of uniting physics and music several interesting questions arise, encouraging a possible meta-philosophy of the two. The traditional purposes, meanings, and practices of each, are challenged; and the results are very pertinent to our current techno-culture. Several sound examples will be presented; and if accepted for programming, the first composition made with these techniques: 50 Particles in a Three-Dimensional Harmonic Potential: An Experiment in 5 Movements.

A Flexible Analysis/Synthesis Method for Transient Phenomena

Harvey Thornburg

Sinusoidal models provide an intuitive, parametric representation for time-varying spectral transformations. However, resynthesis artifacts result to the degree the signal violates assumptions of local stationarity. Common types of transients (or local non-stationary regions) are abrupt changes in spectra, rapid exponentially-decaying modes, and rapid spectral variations (e.g. fast vibrato, chirps, etc.). These phenomena cover a considerably wider framework than that of onset regions in monophonic contexts. Our extended sinusoidal model proceeds with a presegmentation phase followed by region-dependent modeling and resynthesis. In presegmentation, information-theoretic criteria are used to localize abrupt change boundaries, windows are aligned with segment boundaries, then segments are classified as to local stationarity or transience. Locally stationary regions are handled by a sinusoids+noise model. For transients, we adapt parametric models which naturally extend the sinusoids+noise model, such as the time-varying Prony/Kalman model, to mode decay/variation problems. As well as reducing artifacts, extended sinusoids+noise models permit different kinds of processing to be applied to transients, shown to offer the composer considerable flexibility in timestretching-related applications. Finally, we show applications to the single-channel source separation problem and also to that of rhythm-following using a Bayesian framework to handle side information concerning the change boundaries.

Antialiasing for Nonlinearities: Acoustic Modeling and Synthesis Applications

Harvey Thornburg

Nonlinear elements have manifold uses in acoustic modeling, audio synthesis and effects design. Of particular importance is their capacity to control oscillation dynamics in feedback models, and their ability to provide digital systems with a natural overdrive response. Unfortunately, nonlinearities are a major source of aliasing in a digital system. In this paper, alias suppression techniques are introduced which are particularly tailored to preserve response dynamics in acoustic models. To this end, a multirate framework for alias suppression is developed along with the concept of an aliasing signal-to-noise ratio (ASNR). Analysis of this framework proceeds as follows: first, relations are established between ASNR vs. computational cost/delay given an estimate of the reconstructed output spectrum; second, techniques are given to estimate this spectrum in the worst case given only a few statistics of the input (amplitude, bandwidth and DC offset). These tools are used to show that "hard" circuit elements (i.e. saturator, rectifier, and other piecewise linear systems found in bowed-string and single-reed instrument models) generate significant ASNR given reasonable computational constraints. To solve this problem, a parameterizable, general-purpose method for constructing monotonic "softening approximations" is developed and demonstrated to greatly suppress aliasing without additional computational expense. The monotonicity requirement is sufficient to preserve response dynamics in a variety of practical cases. Applications to bowed-string modeling and virtual analog filter emulation are discussed.


next up previous contents
Next: Controllers for Computers and Musical Instruments Up: Research Activities Previous: Computer Music Hardware and Software
CCRMA CCRMA Overview
©2000 CCRMA, Stanford University. All Rights Reserved.