CCRMA
next up previous contents
Next: Digital Signal Processing (Past) Up: Past Research Activities Previous: Computer Music Hardware and Software (Past)

Physical Modeling (Past)


Subsections

From Physics of Piano Strings to Digital Waveguide

Julien Bensa, Stefan Bilbao, Richard Kronland-Martinet, and Julius Smith

Several models of transverse wave propagation on a piano string, of varying degrees of complexity, have appeared in the literature. These models are always framed in terms of a partial differential equation (PDE), or system of PDEs; usually, the crude starting point for such a model is the one-dimensional wave equation, and the more realistic features, such as dispersion, frequency-dependent loss and nonlinear hammer excitation, are incorporated through several perturbation terms. Chaigne and Askenfelt have proposed the most advanced such model, and used it as the basis for a sound synthesis technique, through the use of finite differences--the time waveform on a struck piano string is simulated in this way to a remarkable degree of fidelity.

Digital waveguides, on the other hand, are filter-like structures which model one-dimensional wave propagation as purely lossless throughout the length of the string, with loss and dispersion summarized in terminating lumped filters. They are thus simulations of slightly modified physical systems, but are highly efficient structures in the context of musical sound synthesis. The aim of this paper is to bridge the gap between PDE models and digital waveguides, and to explicitly show the relationship between the lumped filters used to model loss and dispersion and the parameters which define the model PDE, which, in this case, is a carefully chosen variant of Chaigne and Askenfelt's model system. The calibration of the filters to experimentally measured data is also discussed.

References:

Physical Modeling of Brasses (February 1999)

David Berners

One of the difficulties in building waveguide models of brasses and winds is that we do not know how to find the round-trip filtering in a flaring horn without actually making an acoustic measurement. Ideally, we would like to be able to compute the loop filter directly from the physical dimensions of the horn. While significant work has been done along these lines (Causse et al. [1], Plitnik and Strong [2], Benade [3]), a complete and accurate theory is not yet available.

To provide computationally tractable models, the flaring horn is modeled assuming that Webster's horn equation is satisfied, i.e., that a one-parameter solution to the wave equation exists within the boundaries of the horn. Any shape, such as planar or spherical, can be assumed for the wavefront within the horn.

In an ongoing research project at CCRMA, Webster's horn equation is solved as follows: First, the wave equation is converted to the form of the celebrated Schrodinger wave equation through a coordinate transformation outlined by Benade in [3]. Once in Schrodinger's form, the wave equation becomes equivalent to the one-dimensional scattering problem in particle physics, for which efficient and numerically stable solution methods exist (Kalotas and Lee [4]). In the new (transformed) coordinate system, the horn boundary function is replaced by the ``horn potential function," which, in addition to providing the frequency dependent reflection, transmission, and impedance functions for the waveguide, can be used to gain an intuitive understanding of how these characteristics are related to bell flare. The quantities obtained from the solution to Webster's equation are all that is necessary for the design of lumped filters to be used in a digital waveguide model. Advantages over conventional modeling techniques include the ability to specify an arbitrary wavefront shape and possible numerical advantages.

References

Adding Pulsed Noise to Wind Instrument Physical Models (May 1996)

Chris Chafe

Pulsed noise has been detected in the residual of steady flute tones after elimination of purely periodic components. LMS adaptive linear periodic prediction was used to track the waveform through its slight period-to-period fluctuations. The predicted signal was removed from the original leaving a breathy sounding residual to examine. Noise pulses in musical oscillators result from period synchronous gating of the driving means. Bowed string instruments exhibit noise pulses arising from alternating stick-slip motion, where noise is introduced only when the string is slipping. Distinct pulses are also exhibited by the saxophone in which the reed modulates air friction. Flute noise is more continuous than in string or reed tones. Short time fourier transformation of the residual signal reveals that pulses are present, but spectrally weighted toward higher frequencies. A physical model of the flute incorporating a corresponding noise synthesis method is being developed. Results of the simulation are compared for quality of pitch synchronous spectral modulation and effect on frequency jitter.

The method uses a vortex-like noise generator mechanism coupled to the nonlinear excitation mechanism. These components simulate the flute's frictional noise generation and switching air jet, respectively. The vortex is generated by a separate short-cycle nonlinear oscillator. It's output is used to modulate the nonlinearity of the main instrument (for example, a cubic polynomial in Perry Cook's SlideFlute). The vortex's signal input is a flow variable which is controlled by the signal circulating in the main instrument loop.

The resulting oscillation contains noise injected by the vortex and exhibits the desired pitch synchronous spectral changes. The possible classes of instruments for which this might apply include air jet, glottis, and single, double and lip reed.

References

Vicarious Synthesizers: Listening for Timbre (February 1999)

Chris Chafe

The timbre of a digitally synthesized musical sound is usually determined by a group of controls, as for example in physical models of bowed strings (bow force / velocity / position) or sung vowels (complex vocal tract shape / glottal source) and in models using frequency modulation (modulation index / oscillator tuning ratios). In this work which concentrates on the bowed string physical model, possible tone qualities are arrayed in a two-dimensional matrix whose axes (bow force / velocity) represent two of the principle timbral determinants of the synthesis method. Expressive control of timbre in real-time is achieved by navigating the space with a force-feedback pointing device, allowing the musician to feel as well as hear timbral change. Timbral features are displayed kinesthetically as variations in the graph surface. Locations of particular bowed timbres and nearby qualities in their environs are easily learned along with musically important trajectories. The representation provides a window on bowing technique in digitized performances by tracking spectral matches between the recording and matrix.

Synthesis of the Singing Voice Using Physically Parameterized Model of the Human Vocal Tract (May 1996)

Perry Cook

Two voice synthesis systems have been constructed using a physically parameterized model of the human vocal tract. One system is a real-time Digital Signal Processor (DSP) interface, which allows graphical interactive experimentation with the various control parameters. The other system is a text-driven software synthesis program. By making available both real-time control and software synthesis, both rapid experimentation and repeatable results are possible. The vocal tract filter is modeled by a waveguide filter network. Glottal pulses are stored and retrieved from multiple wavetables. To this periodic glottal source is added a filtered pulsed noise component, simulating the turbulence which is generated as air flows through the oscillating vocal folds. To simulate the turbulences of fricatives and other consonants, a filtered noise source can be made arbitrarily resonant at two frequencies, and can be placed at any point within the vocal tract. In the real-time DSP program, called SPASM, all parameters are graphically displayed and can be manipulated by using a computer mouse. Various two-dimensional maps relating vowels and vocal tract shapes are provided, and a user can smoothly vary the control parameters by moving the mouse within a map region. Additional controls include arbitrary mapping of MIDI (Musical Instrument Digital Interface) controls onto the voice instrument parameters. The software synthesis system takes as input a text file which specifies the events which are to be synthesized. An event specification includes a transition time, shape and glottal files as written out by the SPASM system, noise and glottal volumes, glottal frequency (either in Hz or as a musical note name), and vibrato amount. Other control strategies available include text-to-speech/singing and a graphical common music notation program. Support for languages, musical modes, and vocal ornamentations is provided in Latin, and modern Greek.

Use of Physical Model Synthesis for Developing Experimental Techniques in Ethnomusicology: The Case of the Ouldémé Flute (April 2002)

Patricio de la Cuadra

As part of a collaboration project with ethnomusicologists and the Acoustique Instrumentale Group at IRCAM, we study and implement a physical model of a flue instrument, exploring the flexibility to geometry changes in the instrument, possibilities of timbre control as well as playability of the model. Two implementations have been developed: a C++ object using STK library and an external object for MAX. Implementations have been designed as laboratories, allowing the user to adjust many parameters in real time and evaluate the response of the model. A real-time MIDI controller for a flute physical model is being designed.

The goal here is to study the musical scales of the flutes played by a tribe from North Cameroon: the Ouldémé. In their usage, we encounter vocalic-instrumental polyphony at two levels - individual, where each player plays two flutes, and in groups normally formed by five players. The flutes don't have toneholes and they are made of bamboo cane with a blowing end at one side and a close end at the other. Placing the tongue outside of his mouth, the player shapes the air stream, which then strikes a sharp edge of the cane. The mouthpiece is simply done by cutting the cane and there is no additional work on it. Adding water prevents air leakage and, at a lower degree, adjusts the tuning of the instrument.

These flutes work with a turbulent jet flow. The behavior of this type of jet is less understood than the laminar one and it is currently being studied in a parallel project. A one dimensional representation of the dynamics of the jet (formation, velocity fluctuation, oscillations) is used as described in [1]. The bore is modeled using one dimensional waveguides. Visco-thermic losses and radiation of the sound are implemented as linear filters.

Reference:

  1. Verge, M.P. (1995) Aeroacoustics of confined jets. PhD thesis, Technical University of Eindhoven, 1995.

Synthesis of a Neolithic Chinese Flute (April 2002)

Patricio de la Cuadra and Chris Chafe

As part of an ongoing project to model the 9000 year old bone flutes unearthed at the Jiahu archeological site in china, we have currently implemented a real-time one-dimensional model of a flute, incorporating previously described techniques for air jet and wind instrument turbulence. The model is implemented in several systems, including STK, and as an external object in Max/MSP and Pd (pure data). It allows for real-time control of fingering, embouchure, and breath pressure.

One application of the research has been as the instrument (a quartet, actually) in a six-month installation at the San Jose Museum of Art entitled ``Oxygen Flute." Another application is to recreate the sound of several unplayable archeological specimens with performance practice inferred from traditional playing technique.

The ``Flutar'' a New Instrument for Live Performance (May 1996)

Cem Duruoz

``Flutar'' is a cross-synthesis instrument which consists of a physical simulation of the flute combined with a live instrument, in particular the classical guitar. It is implemented by using the software ``SynthBuilder'' on a Next computer. During a live performance, a second computer modifies its parameters in real-time, or in other words ``plays'' the ``flutar'', while the performer plays the guitar. The physical model for the simulation combines an excitation section and a resonator, which correspond to the embouchure and the bore of a real flute, respectively. The two instruments interact with each other during the performance. In other words, the sound that the computer generates is dependent on the guitar sound that it receives by means of a microphone: the amplitude of the guitar sound modifies the input noise that simulates the wind blowing into a flute. At the same time the captured guitar sound goes through the resonator to produce the impression of a ``plucked flute''. This way, there may be resonances which emphasize the guitar sound depending on the pitches played by the guitar as well as the pitch that the flutar is tuned to.

Synthesis of Transients in Classical Guitar Sounds (April 2000)

Cem Duruoz

Synthesis of acoustic musical instrument sounds using computers has been a fundamental problem in acoustics. It is well known that, the transients heard right before, during and right after the attack portion of an instrumental sound are the elements which give the instrument most of its individual character. Therefore, in a synthesis model, it is crucial to implement them carefully, in order to obtain sounds similar to those produced by acoustic instruments. The transients in classical guitar sounds were studied by making studio recordings, digital editing and Fourier Analysis. The sounds heard in the vicinity of the attack were classified according to the origin, spectral content and duration. Next, a hybrid FM/Physical Modeling Synthesis model was developed to produce these transients sequentially. The parameters such as the duration, amplitude and pitch were extracted from further recordings, and incorporated into the model to synthesize realistic classical guitar sounds.

Modeling High Frequency Modes of Complex Resonators Using a Waveguide Mesh (July 2001)

Patty Huang, Stefania Serafin, and Julius O. Smith III

This project was motivated by the need for a high-quality model of a violin body, with reasonable computational cost, in the case where the nonlinear interaction between the bow and string prevents the use of commuted waveguide synthesis. In the current model, a biquad filter bank simulates the important low-frequency resonances. For the complex high-frequency resonances, we use a waveguide mesh embedded with absorption filters to tune the decay times and relative amplitudes of the modes. The goal of the mesh design is to match certain properties of the mesh response to the violin body response in each critical band, such as mode spacing and average bandwidth, to yield an accurate sounding result above some frequency.

The core of this study is to use the waveguide mesh structure not as a physical modeling tool, but as a psychoacosutic modeling tool. The violin body can be generalized to be any complex resonator, and the waveguide mesh is a computational structure which shows promise to simulate the complex and dense high-frequency modes of an instrument's body in a perceptually accurate way.

Computation of Reflection Coefficients for an Axisymmetrical Horn by Boundary Element Method (April 2000)

Shyh-Kang Jeng

It seems that there is no literature about using the boundary-element method (BEM) to deal with a music horn, though some authors have applied the BEM to horn loud speaker problems (for examples, Kristiansen, etc. [1], Henwood [2], and Johansen [3]). The BEM approach starts from the Helmholtz equation of linear acoustics, and makes no approximation except those required for numerical calculation. Therefore, it is expected to include the effect of diffraction from edges and the contribution of higher order modes.

In this research, an integral equation is first derived. Special care has to be taken for the singularities. The formulation will take advantage of the axisymmetry, and will express the pressure field inside the cylindrical section as a summation of modal fields. The boundary-element method is then applied to approximate the integral equation by a matrix one. By solving the matrix equation, we may obtain the reflection coefficient directly. Next, the reflection coefficients for a sequence of sampled frequencies in the desired frequency band are computed and an inverse Fourier transform is performed to obtain the impulse response of an equivalent filter. Finally, an approximate FIR or IIR filter is deduced from the equivalent filter, and a physical model of a brass can be obtained by connecting the approximate filter to a digital waveguide system.

With simple extensions, this approach can be used to model bores and openings of wind instruments.

References

1
U. R. Kristiansen and T. F. Johansen, ``The horn loudspeaker as a screen-diffraction problem,'' Journal of Sound and Vibration, 133(3), pp. 449-456, 1989.

2
D. J. Henwood, ``The boundary-element method and horn design,'' Journal of Audio Engineering Society, Vol. 41, No. 6, pp. 485-496, 1993.

3
T. F. Johansen, ``On the directivity of horn loudspeakers,'' Journal of Audio Engineering Society, Vol. 42, No. 12, pp. 1008-1019, 1994.

Toward High-Quality Singing Synthesis with Varying Sound Qualities (April 2002)

Hui-Ling Lu

To achieve high-quality singing synthesis, spectral modeling and physical modeling have been used in the past. However, spectral models are known to be articulation difficult and expressivity limited. On the other hand, it is not straightforward to adjust physical model parameters to reproduce a specific recording. In this thesis, a high-quality singing synthesizer is proposed with its associated analysis procedure to retrieve the model parameters automatically from the desired voices. Since 95% in singing is voiced sound, the focus of this research is to improve naturalness of the vowel tone quality. In addition, an intuitive parametric model is also developed to control the vocal textures of the synthetic voices ranging from ``pressed", to ``normal", to ``breathy" phonation.

To trade off between complexity of the model and the corresponding analysis procedure, a source-filter type synthesis model is proposed. Based on a simplified human voice production system, the source-filter synthesis model describes human voices as the output of the vocal tract filter excited by a glottal excitation. The vocal tract is modeled as an all-pole filter since only non-nasal voiced sound is focused. To accommodate variations of vocal textures, the glottal excitation model consists of two elements: the derivative glottal wave and the aspiration noise. The derivative glottal wave is modeled by the transformed Liljencrants-Fant (LF) model. Moreover, the aspiration noise is represented as pitch-synchronous, amplitude-modulated Gaussian noise.

The major contribution of this thesis is the development of an analysis procedure that estimates the parameters of the proposed synthesis model to mimic the desired voices. First, a source-filter deconvolution algorithm via the convex optimization technique is proposed to estimate the vocal tract filter from sound recordings. Second, the inverse filtered glottal excitation is decomposed into a smoothed derivative glottal wave and a noise residual component via Wavelet Packet Analysis. Proper parameterizations of the glottal excitation can then be found. By analyzing baritone recordings, a parametric model is constructed for controlling vocal textures in synthesized singing.

References:

Toward a High-Quality Singing Synthesizer (April 2000)

Hui-Ling Lu

Naturalness of the sound quality is essential for the singing synthesis. Since 95% in singing is voiced sound, the focus of this research is to improve the naturalness of the vowel tone quality. In this study, we only focus on the non-nasal voiced sound. To trade off between the complexity of the modeling and the analysis procedure to acquire the model parameters, we propose to use the source-filter type synthesis model, based on a simplified human voice production system. The source-filter model decomposes the human voice production system into three linear systems: glottal source, vocal tract and radiation. The radiation is simplified as a differencing filter. The vocal tract filter is assumed all-poled for non-nasal sound. The glottal source and the radiation are then combined as the derivative glottal wave. We shall call it as the glottal excitation.

The effort is then to estimate the vocal tract filter parameter and glottal excitation to mimic the desired singing vowels. The de-convolution of the vocal tract filter and glottal excitation was developed via the convex optimization technique [1]. Through this de-convolution, one could obtain the vocal tract filter parameters and the glottal excitation waveform.

Since the glottal source modeling has shown to be an important factor for improving the naturalness of the speech synthesis. We are investigating the glottal source modeling alternatives for singing voice. Besides the abilities of flexible pitch and volume control, the desired source model is expected to be capable of controlling the voice quality. The voice quality is restricted to the voice source modification ranging from laryngealized (pressed) to normal to breathy phonation. The evaluation will be based on the flexibility of the control and the ability to mimic the original sound recording of sustained vowels.

Reference

Articulatory Singing Voice Synthesis (February 1999)

Hui-Ling Lu

The goal of this research is to convert score files to synthesized singing voice. The framework is based on a library of control parameters for synthesizing basic phonemes, together with interpolation techniques for synthesizing natural phoneme transitions, tempo, and pitch.

The starting point for this work is the Singing Physical Articulatory Synthesis Model (SPASM), originally developed at CCRMA by Perry Cook. The SPASM software system is based on the ``source-filter'' paradigm: The glottal source (source part) is modeled by a parametric mathematical equation, and the vocal tract (filter part, which shapes the spectrum of the source) is simulated by a digital waveguide filter (DWF).

In this research, the interaction between the source and filter is extended by exploring more complicated glottal source models from the articulatory speech synthesis literature.

It turns out that the control parameter library construction is nontrivial. It includes the ``inversion problem'' which tries to retrieve the model parameters from the voice output signal only. The inversion problem is non-unique and nonlinear. Various existing methods from articulatory speech synthesis and some other general optimization methods are under evaluation.

A Passive Nonlinear Filter for Physical Models (May 1996)

John Pierce and Scott Van Duyne

Nonlinearities, small or large, favorably affect the sounds of many musical instruments. In gongs and cymbals, a gradual welling-up of energy into the high frequencies has been observed. Nonlinearities cause the transfer of energy from lower modes to higher modes after the instrument has been struck. These nonlinearities do not generate new energy, only transfer it. While memoryless square-law and look-up table nonlinearities may be incorporated in computer generation of sounds, these means often cause system energy loss or gain, and are difficult to control when a range of large and small effects are desired.

Our approach to the injection of nonlinearity into resonant systems was to identify a simple passive nonlinear electrical circuit, and then to apply physical modeling techniques to bring it into the digital signal processing domain. The result was an efficient digital nonlinear mode coupler which can be attached to any waveguide termination, or inserted into any resonant digital system where traveling waves are being computed. The mode coupler can be tuned to set the rate of energy spreading as well as the region of the spectrum to be affected. Excellent results have been obtained creating gong and cymbal crash sounds by connecting these passive nonlinear filters to 2-D Digital Waveguide Mesh boundary terminations.

This work has been presented by Scott Van Duyne at the 1994 and 1995 ICMC and at the Washington D.C. meeting of the Acoustical Society of America, 30 May - 3 June, 1995.

Related matters remain under investigation.

Feedback Delay Networks (May 1996)

Davide Rocchesso and Julius Smith

Recursive comb filters are widely used in signal processing, particularly in audio applications such as digital reverberation and sound synthesis. In the recent past, some authors [Stautner-Puckette '82, Jot '92] have considered a generalization of the comb filter known as the feedback delay network (FDN). The main purpose of this research is to investigate the algebraic properties of FDNs as well as to propose some efficient implementations and interesting applications.

The FDN is built using N delay lines, connected in a feedback loop through a set of scattering coefficients. These coefficients may be organized into a ``feedback matrix''. If such a matrix is unitary, system poles have magnitude one and the MFDN has only constant-amplitude eigenmodes. For the structure to be practically useful, an attenuation coefficient must be applied at the output of each delay line to adjust the length of the impulse response.

D. Rocchesso has proposed restricting the feedback matrix to a circulant structure. The resulting Circulant Feedback Delay Network (CFDN) can be efficiently implemented and allow an easy control of the time and frequency behavior. This structure is also proper for VLSI implementation because it can be efficiently made parallel.

A compact sound processing model including early reflections and diffuse reverberation by FDN has been proposed under the name BaBo (the Ball within the Box) [Rocchesso '95].

It has been shown how to use CFDNs for many purposes in sound processing and synthesis: for simulation of radiating structures such as instrument bodies, for simulation of feedback resonators, and even for live electronics performances. These possibilities extend the range of applicability of FDNs beyond reverberation.

CFDNs with short delay lines may be used to produce resonances irregularly distributed over frequency. A possible application could be the simulation of resonances in the body of a violin. In this application the exact position and height of resonances are not important. By changing delay lengths, it is possible to move poles in frequency, while by changing the network coefficients we can re-shape the frequency response. The loop gain determines the maximum peak to valley distance. Such a structure using short delay lines has been used in live-electronic-sound processing, where a dynamic filtering can be achieved by changing the FDN parameters in real time.

CFDNs are also very effective as resonators in Karplus-Strong-like algorithms, especially for simulating membranes or bars.

Connections between FDNs and Digital Waveguide Networks (Smith '85) have been revealed. Julius O. Smith and D. Rocchesso have shown that the FDN is isomorphic to a (normalized) waveguide network consisting of one (parallel) scattering junction and N branches, each connecting to the one scattering junction at one end, and reflectively terminated at the other. This correspondence gives rise to new generalizations in both cases. Theoretical developments in this field have been recently reported [Rocchesso-Smith '97, Smith-Rocchesso '97].

Acoustic Research and Synthesis Models of Woodwind Instruments

Gary P. Scavone

The modeling of musical instruments using digital waveguide methods has proven to be both an accurate and efficient technique for synthesis. Because such models are based on physical descriptions, they further provide a useful tool for acoustical explorations and research.

Current efforts are directed toward the modeling of vocal tract influence in wind instruments [Scavone, 2003]. Several real-time models are developed using digital waveguide techniques to investigate this element of the performer-instrument system. The simplest such system involves modeling the oral cavity with a single resonance peak which can be easily controlled to test coupling and reed entrainment, as well as upstream-downstream interactions. The model verifies upstream influence and demonstrates real-time behavior very similar to that experienced in reed wind instrument playing.

Models of wind instrument air columns have reached a high level of development. An accurate and efficient means for modeling woodwind toneholes was described in [Scavone and Cook, 1998]. Another model of the tonehole was developed with Maarten van Walstijn [van Walstijn and Scavone, 2000]. It uses wave digital filter techniques to avoid a delay-free path in the model, thus addressing a limitation of the distributed model with regard to minimum tonehole heights. Recently, a study comparing tonehole radiation measurements to digital waveguide and frequency-domain model results was conducted [Scavone and Karjalainen, 2002]. Robust, simplified models of conical woodwind instrument air columns have been developed as well [Scavone, 2002].

Previous work focused on modeling the direction-dependent sound radiation from woodwind and brass instruments [Scavone, 1999]. The current acoustic theory regarding sound radiation from ducts and holes can be implemented in the digital waveguide context using properly designed digital filters. Each radiating sound source or hole requires a first- or second-order digital filter to account for angular- and frequency-dependent pressure distribution characteristics. Sound propagation delay from the source to the pickup is modeled with a delay line and possibly a fractional-delay interpolation filter. An additional digital filter to model attenuation in free space can also be used. The results of this model compare well with frequency-domain polar radiation calculations and measurements performed by Antoine Rousseau and René Caussé (1996) at the Institut de Recherche et Coordination Acoustique/Musique (IRCAM). A simplified system appropriate for real-time synthesis was developed using The Synthesis ToolKit (STK) that allows continuous pickup movement within an anechoic 3D space.

References:

The Sound of Friction

Stefania Serafin

This research investigates the use of different friction models with applications to real-time sound synthesis of musical instruments and sound effects in general. In collaboration with Federico Avanzini and Davide Rocchesso, the latest state-of-the art friction models used in robotics and haptics, such as the elasto-plastic model, have been implemented in real-time and connected to audio simulations of different rubbed surfaces such as squeaking doors and rubbed wineglasses.

The behavior of the different friction models has been investigated both in terms of sound quality and in terms of playabilty studies. In particular, the elasto-plastic model has been compared to the plastic model developed by Jim Woodhouse for bowed strings' simulations. The more general elasto-plastic model behaves like the plastic model when applied to the simulation of a bow excited by a string.

Reference:

Analysis and Synthesis of Unusual Friction-Driven Musical Instruments (April 2002)

Stefania Serafin, Patty Huang, Solvi Ystad, Julius Smith, and Chris Chafe

The focus of this research is to model different musical instruments whose main excitation mechanism is driven by friction. Friction is the excitation source of a well-known family of musical instruments, i.e. the family of bowed string instruments. Friction, however, also represents the excitation source for less common instruments such as the glass harmonica and the musical saw.

In this work, we use the idea that musical instruments can be decomposed into an exciter and a resonator. The exciter in all these case is the driving friction mechanism, while the resonator is either a wineglass or a saw blade.

All these models are implemented in real-time using Pd.

References:

Inversion of a Bowed String Physical Model: a Pattern Recognition Approach (July 2001)

Stefania Serafin, Harvey Thornburg, Julius Smith

Modeling the physics of musical instruments is a powerful synthesis technique whose quality is strongly influenced by the input parameters of the model itself. Our research is focused on different techniques to estimate the input parameters for a bowed string physical model based on pattern recognition. Our aim is to invert the model, which means to obtain the correct input parameters corresponding to the right hand of the performer, i.e. the bow force, bow velocity and bow position from the output of a synthetic model and from recordings made on real instruments.

We fit a mixture of Gaussians whose mean and variance are calculated using the expectation-maximization (EM) algorithm. This achieves the desired flexibility while also reducing the number of parameters to estimate. This approach consists of assigning an a-priori probability to each set of data, with the likelihood of each target spectrum calculated using the EM algorithm. We applied this technique both to synthetic data obtained from a bowed string physical model and to measurements made on real instruments, using sensors to measure the input parameters for the training data. The results show that these methods work well for steady state tones, but the method based on the combination of a Bayesian network and ICA is robust for inclusion in a dynamic model capable of identifying the variations of performance parameters.

Realistic and Extended Physical Models of Bowed String Instruments (July 2001)

Stefania Serafin

The focus of this research is to obtain a high quality bowed string synthesizer, which is able to reproduce most of the phenomena which appear in real instruments and also can be extended to provide interesting compositional tools. We built a waveguide bowed string physical model which contains all the main physical properties of real instruments i.e. transversal and torsional waves, model for string stiffness, model for the bow-string interaction and body model.

Our current research consists of finding the parameters to drive the model which give expressive sound quality. This is done by estimating the input parameters from recordings on real instruments and using pattern recognition techniques. Our model runs in real time in Max/MSP and STK and was used by Sile O'Modhain in her dissertation on haptic feedback interfaces, by Charles Nichols for his vBow and by Matthew Burtner with the Metasaxophone.

Physical Modeling of Bowed Strings: Analysis, Real-time Synthesis and Playability (April 2000)

Stefania Serafin and Julius Smith

Recent work on the field of bowed string has produced a real-time bowed string instrument which, despite its simplicity, is able to reproduce most of the phenomena that appear in real instruments. Our current research consists of improving this model, including also refinements made possible by the improvement of hardware technology and the development of efficient digital signal processing algorithms. In particular, we are modeling string stiffness whose main effect is to disperse the sharp corners that characterize the ideal Helmholtz motion. This dispersion is modeled using allpass filters whose coefficients are obtained by minimizing the L-infinity norm of the error between the internal loop phase and its approximation by this filter cascade.

We are also analyzing the "playability" of the model built, examining in which zones of a multidimensional parameter spaces "good tone" is produced. This study focuses on the influence of the torsional waves and on the shape of the friction curve. The aim is to analyze which elements of bowed string instruments are fundamental for bowed string synthesizer, and which can be neglected, to reduce computational cost. This work is extended by examining the attack portion of a virtual bowed string. Since a player's aim is normally to reach Helmholtz motion as quickly as possible, we analyze the attack parameters in order to determine the parameter combination that allows the oscillating string to achieve Helmholtz motion in the shortest time.

This research is part of the work done by the Strad (Sistema Tempo Reale Archetto Digitale), group made by CCRMA people working on different aspects of bowed string instruments.

The waveguide physical model we have built runs in real-time under different sound synthesis platforms e.g. Max/MSP, the Synthesis Toolkit, Common Lisp Music.

Another part of this research group consists in building controllers to allow composers and performers to play the model. In particular, we are interested in controllers that incorporate force feedback because they allow us to couple the sound and feel of a given bowing gesture. We are currently constructing an actuated bow and running experiments to discover the role played by tactile and kinesthetic feedback in stringed instrument playing.

Applications of Bioacoustics to the Creation of New Musical Instruments

Tamara Smyth

Animal sound production mechanisms are remarkably similar to those of many musical instruments. Though vibrating membranes, plates and shells, and acoustic tubes and cavities, are important components of any acoustic system, in musical and bioacoustic systems these components enable the production of sound that is undeniably captivating to human listeners.

There are, however, intriguing and potentially musical sounds produced by bioacoustic mechanisms that do not exist in traditional musical instruments. This research concentrates on these particular bioacoustic systems, and through the development of mechanical and computational models, aims to determine whether any aspect of the system is suitable for musical instrument design (either in sound production, acoustic output, or user control).

There are two aspects of instrument design that are being addressed: sound production, which involves the use of physical modeling techniques to develop quality sound synthesis models of animal sound production mechanisms, and sound control, building haptic human interfaces to the computer synthesized instrument, based on these same mechanisms.

A mechanical model of the cicada's unique and efficient sound excitation mechanism was built to determine whether or not such a mechanism could also be used by a human (who has more limited muscular control). In addition to providing an accurate input signal (one that represents the buckling ribs of the cicada) to the physical model, it serves as a mechanical haptic interface, facilitating the user's ability to control the instrument's sound. The mechanical controller is being used for both scientific understanding of the cicada's buckling mechanism and for musical experimentation.

Another bioacoustic system currently being researched is the bird's syrinx. This system is of particular interest because, in addition to being a musical inspiration to many composers, it has a unique structure that allows for rapid shifts from low to high registers in the bird's, often virtuosic, song.

Far too often, parameter-rich physical models are developed with no means of controlling them. Likewise, musical controllers are often built with nothing to control. In both cases, the music is lost. This research intends to bridge the separation between the development of parametric sounds and the development of the devices used for controlling them, while offering new and intriguing musical instruments to contemporary musicians.

Reference:

The Digital Waveguide Mesh (May 1996)

Scott Van Duyne and Julius O. Smith

The traveling wave solution to the wave equation for an ideal string or acoustical tube has been modeled efficiently with bi-directional delay-line waveguides. Two arbitrary traveling waves propagate independently in their respective left and right directions, while the actual pressure at any point may be obtained by summing the theoretical pressures in the left- and right-going waves.

Excellent results have been obtained modeling strings and acoustic tubes using one-dimensional waveguide resonant filter structures. However, there is a large class of musical instruments and reverberant structures which cannot be reduced to a one-dimensional traveling wave model: drums, plates, gongs, cymbals, wood blocks, sound boards, boxes, rooms--in general, percussion instruments and reverberant solids and spaces.

In the two dimensional case of wave propagation in an ideal membrane, the traveling wave solution involves the integral sum of an infinite number of arbitrary plane waves traveling in all directions. Therefore we cannot just allocate a delay line for every traveling plane wave. Finite element and difference equation methods are known which can help with the numerical solution to this problem; however, these methods have had two drawbacks: (1) their heavy computational time is orders of magnitude beyond reach of real time, and (2) traditional problem formulations fit only awkwardly into the physical model arena of linear systems, filters, and network interactions.

Our solution is a formulation of the N-dimensional wave equation in terms of a network of bi-directional delay elements and multi-port scattering junctions. The essential structure of the two-dimensional case is a layer of parallel vertical waveguides superimposed on a layer of parallel horizontal waveguides intersecting each other at 4-port scattering junctions between each bi-directional delay unit. The 4-port junctions may be implemented with no multiplies in the equal impedance case. Plane waves, circular waves, and elliptical waves all propagate as desired in the waveguide mesh. Band limited accuracy can be enforced. The three-dimensional extension of the waveguide mesh is obtained by layering two-dimensional meshes and making all the 4-port junctions into 6-ports, or though a tetrahedral, four-port, no-multiply structure.

The two-dimensional waveguide mesh is mathematically equivalent to the standard second-order-accurate finite partial difference formulation of the wave equation. It, therefore, exhibits the desirable stability and convergence properties of that formulation. However, the numerical solution methods of initial value problems involving second order hyperbolic partial difference equations usually require a multi-step time scheme which retains values for at least two previous time frames. The waveguide mesh reduces this structure to a one-step time scheme with two passes: (1) In the computation pass, the scattering junction computations are performed in any order (a feature well-suited to parallel computation architectures); then (2) in a delay pass, their outputs are moved to the inputs of adjacent junctions.

Current work on the waveguide mesh is in (1) exploring alternative spatial sampling methods, (2) developing efficient hardware implementation structures, (3) introducing loss and dispersion into the mesh in a physically correct, yet efficient, manner, and (4) finding the right parameters to model specific musical instruments and spaces.

The Wave Digital Hammer (May 1996)

Scott Van Duyne and Julius O. Smith

Recent work has led to digital waveguide string models and physical models of membranes using a 2-D digital waveguide mesh. We are currently working on ways to excite these models in physically correct ways. One obvious need is a good model of the felt mallet for drums and gongs, and of the piano hammer for strings.

The attack transient of a struck string or membrane can be approximated by the injection of an appropriate excitation signal into the resonant system. However, this excitation method is not sufficient to cope with the complexities of certain real musical situations. When a mallet strikes an ideal membrane or string, it sinks down into it, feeling a pure resistive impedance. In the membrane case, the depression induces a circular traveling wave outward. If the membrane were infinite, the waves would never return, and the mallet would come to rest, losing all its energy into the membrane. If the membrane is bounded, however, reflected waves return to the strike point to throw the mallet away from the membrane. The first reflected wave to reach the mallet may not be sufficiently powerful to throw the mallet all the way clear, or may only slow down its motion, and later reflected waves may finally provide the energy to finish the job. This complex mallet-membrane interaction can have very different and difficult to predict acoustical effects, particularly when a second or third strike occurs while the membrane is still in motion.

In our model, we view the felt mallet as a nonlinear mass/spring system, the spring representing the felt portion. Since the felt is very compliant when the mallet is just barely touching the membrane, yet very stiff when fully compressed, we must use a nonlinear spring in the model, whose stiffness constant varies with its compression. Our essential requirements are that the model be digitally efficient, that it be easily interconnected to waveguide structures, and that it be able to compute arbitrarily accurate strike transients from measured data from real hammers, strings, mallets, and drums.

The Commuted Waveguide Piano (May 1996)

Scott Van Duyne and Julius O. Smith

Making a good piano synthesis algorithm traditionally has not been easy. The best results thus far have been in the area of direct sampling of piano tones. This approach is memory intensive, as there is a lot of variety in the piano timbre ranging from low notes to high, and from soft to loud. In addition, sampling techniques don't have a good answer to the problem of multiple strikes of the same string while it is still sounding, nor to the coupling of strings which are undamped while other strings are sounding. We believe that the solution will be found through waveguide modeling and DSP synthesis techniques. An even more intrinsic problem of synthesizers is that they don't feel anything like a piano when you play them. Currently at CCRMA, we have the good fortune of having a variety of people working separately on solutions to different parts of the piano problem, although their individual work may have broader applications.

The piano problem may be broken down into five basic parts: (1) the string, (2) the soundboard, (3) the piano hammer and damper, (4) the key mechanism itself, and (5) the implementation hardware and software.

The primary difficulties of modeling the string are found in that the piano string harmonics are not exactly harmonic, and in that there is significant coupling between horizontal, vertical and longitudinal modes on the string. In addition, there may be important nonlinear effects. Work being done by Julius Smith on string coupling and fitting loop filters to measured data will solve some of the problems. Other work by Scott Van Duyne and Julius Smith will lead to simplifications in modeling the stretching harmonics of the piano string. It may be that work relating to passive nonlinearities by John Pierce and Scott Van Duyne will be helpful for the finest quality tone generation.

The soundboard can now be modeled in a fully physical way using the 2-D Digital Waveguide Mesh, a recent development by Scott Van Duyne and Julius Smith which extends waveguide modeling techniques into two or more dimensions. Julius Smith is working on applying results from his work on bowed strings to piano synthesis; extremely efficient new algorithms are possible using this approach.

The excitation of waveguide string models has till now been left primarily to loading the string with an initial condition and letting it go, or to driving the waveguide loop with an excitation signal tailored to the desired spectral response of the attack transient. While almost any attack transient may be achieved through driving the model with an excitation signal, the variety of interactions that a piano hammer may have with a string is immense when one considers the possibilities ranging over the very different high and low strings, and over the wide range of strike forces. Further, it would be virtually impossible to catalog the possible attack transients due to a hammer hitting a string which is already in motion due to a previous strike. The hammer/string interaction is very complex. Fortunately, recent work initiated by Scott Van Duyne and continued with Julius Smith on modeling piano hammers as wave digital nonlinear mass/spring systems will allow all these complex interactions to fall directly out of the model.

Brent Gillespie's work on the touchback keyboard project will provide a realistic controlling mechanism for the piano synthesis algorithm. The touchback keyboard is a haptic control mechanism, driven by a computer controlled motor. It looks like a piano key, and feels like a piano key. That is, it senses the force applied by a person to the key, and computes, in real time, the correct key motion response based on the equations of motion of the internal key mechanism. It is easy for the touchback keyboard to provide the felt hammer element of the tone synthesis algorithm with a hammer strike velocity. This velocity will be used to drive the synthesis algorithm. In return, the piano hammer element can provide the touchback keyboard with a return hammer velocity at the right time, and the person playing the key will feel the appropriate haptic response.

The hardware and software to implement this complete piano model is available now. The touchback keyboard is controlled by a PC with an add-on card dedicated to real-time computations of the equations of motion. The NextStep operating system running on a Next or PC platform will provide a suitable environment for the synthesis algorithm. Specifically, the SynthBuilder Application being developed by Nick Porcaro and Julius Smith provides a cut-and-paste prototyping environment for real-time DSP-based audio synthesis, and the Music Kit, being maintained and improved by David Jaffe, provides higher-level access to the DSP 56000 card.

There is additional research interest in vibrotactile feedback in the piano keys as suggested in the current work of Chris Chafe. While this effect may be less important in the modern piano, it is certainly more important in early keyboard instruments, and critical in the clavichord, where the hammer may remain in contact with the vibrating string after striking it. Further, we shall want to make the piano sound as if it were somewhere in a particular room or concert hall. Work by John Chowning in localization, by Jan Chomyszyn in loudness perception, by Steven Trautmann in speaker arrays, and by R. J. Fleck in efficient reverberation models can round off the final auditory experience.



© Copyright 2005 CCRMA, Stanford University. All rights reserved.