CCRMA
next up previous contents
Next: Controllers for Computers and Musical Instruments (Past) Up: Past Research Activities Previous: Computer Music Hardware and Software (Past)

Physical Modeling and Digital Signal Processing (Past)




Physical Modeling of Brasses (February 1999)

David Berners

One of the difficulties in building waveguide models of brasses and winds is that we do not know how to find the round-trip filtering in a flaring horn without actually making an acoustic measurement. Ideally, we would like to be able to compute the loop filter directly from the physical dimensions of the horn. While significant work has been done along these lines (Causse et al. [1], Plitnik and Strong [2], Benade [3]), a complete and accurate theory is not yet available.

To provide computationally tractable models, the flaring horn is modeled assuming that Webster's horn equation is satisfied, i.e., that a one-parameter solution to the wave equation exists within the boundaries of the horn. Any shape, such as planar or spherical, can be assumed for the wavefront within the horn.

In an ongoing research project at CCRMA, Webster's horn equation is solved as follows: First, the wave equation is converted to the form of the celebrated Schrodinger wave equation through a coordinate transformation outlined by Benade in [3]. Once in Schrodinger's form, the wave equation becomes equivalent to the one-dimensional scattering problem in particle physics, for which efficient and numerically stable solution methods exist (Kalotas and Lee [4]). In the new (transformed) coordinate system, the horn boundary function is replaced by the ``horn potential function," which, in addition to providing the frequency dependent reflection, transmission, and impedance functions for the waveguide, can be used to gain an intuitive understanding of how these characteristics are related to bell flare. The quantities obtained from the solution to Webster's equation are all that is necessary for the design of lumped filters to be used in a digital waveguide model. Advantages over conventional modeling techniques include the ability to specify an arbitrary wavefront shape and possible numerical advantages.

References

Numerical Integration of Partial Differential Equations (February 1999)

Stefan Bilbao

This work focuses on a numerical integration method for partial differential equations (PDEs) which is an outgrowth of Wave Digital Filtering (WDF), a well-known digital filter design technique. The idea is, as in the lumped case, to map a Kirchoff circuit to a signal flow diagram in such a way that the energetic properties of the various circuit elements are preserved. The method can be extended to distributed systems as well. The chief benefit of this method, which has been around for some ten years now, is its guaranteed stability under very general conditions. Applications are to modelling distributed acoustic, electromagnetic and even highly nonlinear fluid dynamics phenomena. Current work is concerned with, in particular, a WDF version of the PML (perfectly matched layer) used for unbounded domain problems in electromagnetics, flux-splitting in the WDF framework, incorporating the entropy variable formulation of fluid dynamics, higher-order accurate discretization formulae which preserve passivity, and other projects.

Adding Pulsed Noise to Wind Instrument Physical Models (May 1996)

Chris Chafe

Pulsed noise has been detected in the residual of steady flute tones after elimination of purely periodic components. LMS adaptive linear periodic prediction was used to track the waveform through its slight period-to-period fluctuations. The predicted signal was removed from the original leaving a breathy sounding residual to examine. Noise pulses in musical oscillators result from period synchronous gating of the driving means. Bowed string instruments exhibit noise pulses arising from alternating stick-slip motion, where noise is introduced only when the string is slipping. Distinct pulses are also exhibited by the saxophone in which the reed modulates air friction. Flute noise is more continuous than in string or reed tones. Short time fourier transformation of the residual signal reveals that pulses are present, but spectrally weighted toward higher frequencies. A physical model of the flute incorporating a corresponding noise synthesis method is being developed. Results of the simulation are compared for quality of pitch synchronous spectral modulation and effect on frequency jitter.

The method uses a vortex-like noise generator mechanism coupled to the nonlinear excitation mechanism. These components simulate the flute's frictional noise generation and switching air jet, respectively. The vortex is generated by a separate short-cycle nonlinear oscillator. It's output is used to modulate the nonlinearity of the main instrument (for example, a cubic polynomial in Perry Cook's SlideFlute). The vortex's signal input is a flow variable which is controlled by the signal circulating in the main instrument loop.

The resulting oscillation contains noise injected by the vortex and exhibits the desired pitch synchronous spectral changes. The possible classes of instruments for which this might apply include air jet, glottis, and single, double and lip reed.

References

Vicarious Synthesizers: Listening for Timbre (February 1999)

Chris Chafe

The timbre of a digitally synthesized musical sound is usually determined by a group of controls, as for example in physical models of bowed strings (bow force / velocity / position) or sung vowels (complex vocal tract shape / glottal source) and in models using frequency modulation (modulation index / oscillator tuning ratios). In this work which concentrates on the bowed string physical model, possible tone qualities are arrayed in a two-dimensional matrix whose axes (bow force / velocity) represent two of the principle timbral determinants of the synthesis method. Expressive control of timbre in real-time is achieved by navigating the space with a force-feedback pointing device, allowing the musician to feel as well as hear timbral change. Timbral features are displayed kinesthetically as variations in the graph surface. Locations of particular bowed timbres and nearby qualities in their environs are easily learned along with musically important trajectories. The representation provides a window on bowing technique in digitized performances by tracking spectral matches between the recording and matrix.

Synthesis of the Singing Voice Using Physically Parameterized Model of the Human Vocal Tract (May 1996)

Perry Cook

Two voice synthesis systems have been constructed using a physically parameterized model of the human vocal tract. One system is a real-time Digital Signal Processor (DSP) interface, which allows graphical interactive experimentation with the various control parameters. The other system is a text-driven software synthesis program. By making available both real-time control and software synthesis, both rapid experimentation and repeatable results are possible. The vocal tract filter is modeled by a waveguide filter network. Glottal pulses are stored and retrieved from multiple wavetables. To this periodic glottal source is added a filtered pulsed noise component, simulating the turbulence which is generated as air flows through the oscillating vocal folds. To simulate the turbulences of fricatives and other consonants, a filtered noise source can be made arbitrarily resonant at two frequencies, and can be placed at any point within the vocal tract. In the real-time DSP program, called SPASM, all parameters are graphically displayed and can be manipulated by using a computer mouse. Various two-dimensional maps relating vowels and vocal tract shapes are provided, and a user can smoothly vary the control parameters by moving the mouse within a map region. Additional controls include arbitrary mapping of MIDI (Musical Instrument Digital Interface) controls onto the voice instrument parameters. The software synthesis system takes as input a text file which specifies the events which are to be synthesized. An event specification includes a transition time, shape and glottal files as written out by the SPASM system, noise and glottal volumes, glottal frequency (either in Hz or as a musical note name), and vibrato amount. Other control strategies available include text-to-speech/singing and a graphical common music notation program. Support for languages, musical modes, and vocal ornamentations is provided in Latin, and modern Greek.

The ``Flutar'' a New Instrument for Live Performance (May 1996)

Cem Duruoz

``Flutar'' is a cross-synthesis instrument which consists of a physical simulation of the flute combined with a live instrument, in particular the classical guitar. It is implemented by using the software ``SynthBuilder'' on a Next computer. During a live performance, a second computer modifies its parameters in real-time, or in other words ``plays'' the ``flutar'', while the performer plays the guitar. The physical model for the simulation combines an excitation section and a resonator, which correspond to the embouchure and the bore of a real flute, respectively. The two instruments interact with each other during the performance. In other words, the sound that the computer generates is dependent on the guitar sound that it receives by means of a microphone: the amplitude of the guitar sound modifies the input noise that simulates the wind blowing into a flute. At the same time the captured guitar sound goes through the resonator to produce the impression of a ``plucked flute''. This way, there may be resonances which emphasize the guitar sound depending on the pitches played by the guitar as well as the pitch that the flutar is tuned to.

Scalable Audio Models for Data Compression and Modifications (February 1999)

Scott Levine

The best methods currently for high quality, low bitrate audio compression algorithms are based on filterbanks. While current algorithms, such as MPEG-AAC (Advanced Audio Compression), achieve very high data efficiency, it is very difficult to perform modifications such as time stretching and pitch shifting on the compressed data.

In this study, we investigate a more flexible model for audio that allows competitive scalable data compression rates while allowing for simple modifications on the compressed data. Through a combination of multiresolution sinusoidal modeling, transient modeling, and noise modeling, we can achieve both a scalable, efficient audio data representation that is also easy to modify.

See Also: http://www-ccrma.stanford.edu/~scottl/thesis.html

Articulatory Singing Voice Synthesis (February 1999)

Hui-Ling Lu

The goal of this research is to convert score files to synthesized singing voice. The framework is based on a library of control parameters for synthesizing basic phonemes, together with interpolation techniques for synthesizing natural phoneme transitions, tempo, and pitch.

The starting point for this work is the Singing Physical Articulatory Synthesis Model (SPASM), originally developed at CCRMA by Perry Cook. The SPASM software system is based on the ``source-filter'' paradigm: The glottal source (source part) is modeled by a parametric mathematical equation, and the vocal tract (filter part, which shapes the spectrum of the source) is simulated by a digital waveguide filter (DWF).

In this research, the interaction between the source and filter is extended by exploring more complicated glottal source models from the articulatory speech synthesis literature.

It turns out that the control parameter library construction is nontrivial. It includes the ``inversion problem'' which tries to retrieve the model parameters from the voice output signal only. The inversion problem is non-unique and nonlinear. Various existing methods from articulatory speech synthesis and some other general optimization methods are under evaluation.

A Passive Nonlinear Filter for Physical Models (May 1996)

John Pierce and Scott Van Duyne

Nonlinearities, small or large, favorably affect the sounds of many musical instruments. In gongs and cymbals, a gradual welling-up of energy into the high frequencies has been observed. Nonlinearities cause the transfer of energy from lower modes to higher modes after the instrument has been struck. These nonlinearities do not generate new energy, only transfer it. While memoryless square-law and look-up table nonlinearities may be incorporated in computer generation of sounds, these means often cause system energy loss or gain, and are difficult to control when a range of large and small effects are desired.

Our approach to the injection of nonlinearity into resonant systems was to identify a simple passive nonlinear electrical circuit, and then to apply physical modeling techniques to bring it into the digital signal processing domain. The result was an efficient digital nonlinear mode coupler which can be attached to any waveguide termination, or inserted into any resonant digital system where traveling waves are being computed. The mode coupler can be tuned to set the rate of energy spreading as well as the region of the spectrum to be affected. Excellent results have been obtained creating gong and cymbal crash sounds by connecting these passive nonlinear filters to 2-D Digital Waveguide Mesh boundary terminations.

This work has been presented by Scott Van Duyne at the 1994 and 1995 ICMC and at the Washington D.C. meeting of the Acoustical Society of America, 30 May - 3 June, 1995.

Related matters remain under investigation.

Optimal Signal Processing for Acoustical Systems (January 1998)

Bill Putnam

Recent advances in optimization theory have made it feasible to solve a class very large scale optimization problems in an efficient manner. Specifically, if a problem can be shown to be /emphconvex, then one can make use of recent advances in interior point optimization methods to achieve optimal solutions to problems whose scale is beyond the capabilities of more traditional optimization techniques.

Many interesting problems in audio and acoustical signal processing can be shown to belong to the class of convex optimization problems. My research has focused on several of these problems.

A real time system is being developed for both of the above applications. This system is capable of measurement, and subsequent implementation of a parallel bank of filters. The `Frankenstein' hardware is used to allow for up to 16 separate channels of audio. A version of the software using commercially available DSP hardware will be available from http://www.ccrma.stanford.edu/~putnam.

Feedback Delay Networks (May 1996)

Davide Rocchesso and Julius Smith

Recursive comb filters are widely used in signal processing, particularly in audio applications such as digital reverberation and sound synthesis. In the recent past, some authors [Stautner-Puckette '82, Jot '92] have considered a generalization of the comb filter known as the feedback delay network (FDN). The main purpose of this research is to investigate the algebraic properties of FDNs as well as to propose some efficient implementations and interesting applications.

The FDN is built using N delay lines, connected in a feedback loop through a set of scattering coefficients. These coefficients may be organized into a ``feedback matrix''. If such a matrix is unitary, system poles have magnitude one and the MFDN has only constant-amplitude eigenmodes. For the structure to be practically useful, an attenuation coefficient must be applied at the output of each delay line to adjust the length of the impulse response.

D. Rocchesso has proposed restricting the feedback matrix to a circulant structure. The resulting Circulant Feedback Delay Network (CFDN) can be efficiently implemented and allow an easy control of the time and frequency behavior. This structure is also proper for VLSI implementation because it can be efficiently made parallel.

A compact sound processing model including early reflections and diffuse reverberation by FDN has been proposed under the name BaBo (the Ball within the Box) [Rocchesso '95].

It has been shown how to use CFDNs for many purposes in sound processing and synthesis: for simulation of radiating structures such as instrument bodies, for simulation of feedback resonators, and even for live electronics performances. These possibilities extend the range of applicability of FDNs beyond reverberation.

CFDNs with short delay lines may be used to produce resonances irregularly distributed over frequency. A possible application could be the simulation of resonances in the body of a violin. In this application the exact position and height of resonances are not important. By changing delay lengths, it is possible to move poles in frequency, while by changing the network coefficients we can re-shape the frequency response. The loop gain determines the maximum peak to valley distance. Such a structure using short delay lines has been used in live-electronic-sound processing, where a dynamic filtering can be achieved by changing the FDN parameters in real time.

CFDNs are also very effective as resonators in Karplus-Strong-like algorithms, especially for simulating membranes or bars.

Connections between FDNs and Digital Waveguide Networks (Smith '85) have been revealed. Julius O. Smith and D. Rocchesso have shown that the FDN is isomorphic to a (normalized) waveguide network consisting of one (parallel) scattering junction and N branches, each connecting to the one scattering junction at one end, and reflectively terminated at the other. This correspondence gives rise to new generalizations in both cases. Theoretical developments in this field have been recently reported [Rocchesso-Smith '97, Smith-Rocchesso '97].

Acoustic Research and Synthesis Models of Woodwind Instruments (January 1998)

Gary P. Scavone

The modeling of musical instruments using digital waveguide techniques has proven to be both an accurate and efficient technique for synthesis. Because such models are based on physical descriptions, they further provide a useful tool for acoustical explorations and research.

Dissertation

Results of this work were recently published in An Acoustic Analysis of Single-Reed Woodwind Instruments with an Emphasis on Design and Performance Issues and Digital Waveguide Modeling Techniques, a Ph.D. thesis completed at CCRMA, Stanford University. In this study, current acoustic theory regarding single-reed woodwind instruments is reviewed and summarized, with special attention given to a complete analysis of conical air column issues. This theoretical acoustic foundation is combined with an empirical perspective gained through professional performance experience in a discussion of woodwind instrument design and performance issues. Early saxophone design specifications, as given by Adolphe Sax, are investigated to determine possible influences on instrument response and intonation. Issues regarding saxophone mouthpiece geometry are analyzed. Piecewise cylindrical and conical section approximations to narrow and wide mouthpiece chamber designs offer an acoustic basis to the largely subjective examinations of mouthpiece effects conducted in the past. The influence of vocal tract manipulations in the control and performance of woodwind instruments is investigated and compared with available theoretical analyses. Several extended performance techniques are discussed in terms of acoustic principles.

Discrete-time methods are presented for accurate time-domain implementation of single-reed woodwind instrument acoustic theory using digital waveguide techniques. Two methods for avoiding unstable digital waveguide scattering junction implementations, associated with taper rate discontinuities in conical air columns, are introduced. A digital waveguide woodwind tonehole model is presented which incorporates both shunt and series impedance parameters. Two-port and three-port scattering junction tonehole implementations are investigated and the results are compared with the acoustic literature. Several methods for modeling the single-reed excitation mechanism are discussed.

Expressive controls within the context of digital waveguide woodwind models are presented, as well as model extensions for the implementation of register holes and mouthpiece variations. Issues regarding the control and performance of real-time models are discussed. Techniques for verifying and calibrating the time-domain behavior of these models are investigated and a study is presented which seeks to identify an instrument's linear and nonlinear characteristics based on periodic prediction.

Current and Future Work

This area of research is ongoing, with current efforts aimed at developing a complete set of C++ routines within Perry Cook's real-time synthesis environment, Synthesis Toolkit.

The performance flexibility offered by current real-time woodwind computer models is generally uncontrollable within the context of existing MIDI wind controllers. A new wind controller which allows variable tonehole closure and non-traditional fingering is in the design stages.

References

Spectral Modeling Synthesis (SMS) (January 1998)

Xavier Serra

Spectral Modeling Synthesis (SMS) is a set of techniques and software implementations for the analysis, transformation and synthesis of musical sounds. SMS software implementations were first done by Xavier Serra and Julius Smith at Stanford University, and more recently by the first author and the music technology group of the Audiovisual Institute of the Pompeu Fabra University in Barcelona. The aim of this work is to get general and musically meaningful sound representations based on analysis, from which musical parameters might be manipulated while maintaining high quality sound. These techniques can be used for synthesis, processing and coding applications, while some of the intermediate results might also be applied to other music related problems, such as sound source separation, musical acoustics, music perception, or performance analysis.

Our current focus is on the development of a general purpose musical synthesizer. This application goes beyond the analysis and resynthesis of single sounds and some of its specific requirements are:

  1. it should work for a wide range of sounds;
  2. it should have an efficient real time implementation for polyphonic instruments;
  3. the stored data should take little space;
  4. it should be expressive and have controls that are musically meaningful;
  5. a wide range of sound effects, such as reverberation, should be easily incorporated into the synthesis without much extra cost.

The implementation of these techniques has been done in C++ and Matlab, and the graphical interfaces with Visual C++ for Windows 95. Most of the software and the detailed specifications of the techniques and protocols used are publicly available via the SMS Web site.

Applying Psychoacoustic Principles to Soundfield Reconstruction (January 1998)

Steven Trautmann

Simulations and simple experiments have indicated that a broad class of musical signals can benefit from some simple processes aimed at reproducing a soundfield's perception accurately through loudspeakers. These processes attempt to recreate relative phase and amplitude information accurately at the listeners' ears, while allowing distortions elsewhere. The net effect should be to give a more accurate reproduction of important cues for localization and other factors to the listeners. Current work is geared toward expanding these results, by increasing the mathematical rigor and creating further generalizations, and by looking at how other psychoacoustic effects such as masking effects can be applied to further increase the accuracy of the reproduced soundfield's perception.

The Digital Waveguide Mesh (May 1996)

Scott Van Duyne and Julius O. Smith

The traveling wave solution to the wave equation for an ideal string or acoustical tube has been modeled efficiently with bi-directional delay-line waveguides. Two arbitrary traveling waves propagate independently in their respective left and right directions, while the actual pressure at any point may be obtained by summing the theoretical pressures in the left- and right-going waves.

Excellent results have been obtained modeling strings and acoustic tubes using one-dimensional waveguide resonant filter structures. However, there is a large class of musical instruments and reverberant structures which cannot be reduced to a one-dimensional traveling wave model: drums, plates, gongs, cymbals, wood blocks, sound boards, boxes, rooms--in general, percussion instruments and reverberant solids and spaces.

In the two dimensional case of wave propagation in an ideal membrane, the traveling wave solution involves the integral sum of an infinite number of arbitrary plane waves traveling in all directions. Therefore we cannot just allocate a delay line for every traveling plane wave. Finite element and difference equation methods are known which can help with the numerical solution to this problem; however, these methods have had two drawbacks: (1) their heavy computational time is orders of magnitude beyond reach of real time, and (2) traditional problem formulations fit only awkwardly into the physical model arena of linear systems, filters, and network interactions.

Our solution is a formulation of the N-dimensional wave equation in terms of a network of bi-directional delay elements and multi-port scattering junctions. The essential structure of the two-dimensional case is a layer of parallel vertical waveguides superimposed on a layer of parallel horizontal waveguides intersecting each other at 4-port scattering junctions between each bi-directional delay unit. The 4-port junctions may be implemented with no multiplies in the equal impedance case. Plane waves, circular waves, and elliptical waves all propagate as desired in the waveguide mesh. Band limited accuracy can be enforced. The three-dimensional extension of the waveguide mesh is obtained by layering two-dimensional meshes and making all the 4-port junctions into 6-ports, or though a tetrahedral, four-port, no-multiply structure.

The two-dimensional waveguide mesh is mathematically equivalent to the standard second-order-accurate finite partial difference formulation of the wave equation. It, therefore, exhibits the desirable stability and convergence properties of that formulation. However, the numerical solution methods of initial value problems involving second order hyperbolic partial difference equations usually require a multi-step time scheme which retains values for at least two previous time frames. The waveguide mesh reduces this structure to a one-step time scheme with two passes: (1) In the computation pass, the scattering junction computations are performed in any order (a feature well-suited to parallel computation architectures); then (2) in a delay pass, their outputs are moved to the inputs of adjacent junctions.

Current work on the waveguide mesh is in (1) exploring alternative spatial sampling methods, (2) developing efficient hardware implementation structures, (3) introducing loss and dispersion into the mesh in a physically correct, yet efficient, manner, and (4) finding the right parameters to model specific musical instruments and spaces.

The Wave Digital Hammer (May 1996)

Scott Van Duyne and Julius O. Smith

Recent work has led to digital waveguide string models and physical models of membranes using a 2-D digital waveguide mesh. We are currently working on ways to excite these models in physically correct ways. One obvious need is a good model of the felt mallet for drums and gongs, and of the piano hammer for strings.

The attack transient of a struck string or membrane can be approximated by the injection of an appropriate excitation signal into the resonant system. However, this excitation method is not sufficient to cope with the complexities of certain real musical situations. When a mallet strikes an ideal membrane or string, it sinks down into it, feeling a pure resistive impedance. In the membrane case, the depression induces a circular traveling wave outward. If the membrane were infinite, the waves would never return, and the mallet would come to rest, losing all its energy into the membrane. If the membrane is bounded, however, reflected waves return to the strike point to throw the mallet away from the membrane. The first reflected wave to reach the mallet may not be sufficiently powerful to throw the mallet all the way clear, or may only slow down its motion, and later reflected waves may finally provide the energy to finish the job. This complex mallet-membrane interaction can have very different and difficult to predict acoustical effects, particularly when a second or third strike occurs while the membrane is still in motion.

In our model, we view the felt mallet as a nonlinear mass/spring system, the spring representing the felt portion. Since the felt is very compliant when the mallet is just barely touching the membrane, yet very stiff when fully compressed, we must use a nonlinear spring in the model, whose stiffness constant varies with its compression. Our essential requirements are that the model be digitally efficient, that it be easily interconnected to waveguide structures, and that it be able to compute arbitrarily accurate strike transients from measured data from real hammers, strings, mallets, and drums.

The Commuted Waveguide Piano (May 1996)

Scott Van Duyne and Julius O. Smith

Making a good piano synthesis algorithm traditionally has not been easy. The best results thus far have been in the area of direct sampling of piano tones. This approach is memory intensive, as there is a lot of variety in the piano timbre ranging from low notes to high, and from soft to loud. In addition, sampling techniques don't have a good answer to the problem of multiple strikes of the same string while it is still sounding, nor to the coupling of strings which are undamped while other strings are sounding. We believe that the solution will be found through waveguide modeling and DSP synthesis techniques. An even more intrinsic problem of synthesizers is that they don't feel anything like a piano when you play them. Currently at CCRMA, we have the good fortune of having a variety of people working separately on solutions to different parts of the piano problem, although their individual work may have broader applications.

The piano problem may be broken down into five basic parts: (1) the string, (2) the soundboard, (3) the piano hammer and damper, (4) the key mechanism itself, and (5) the implementation hardware and software.

The primary difficulties of modeling the string are found in that the piano string harmonics are not exactly harmonic, and in that there is significant coupling between horizontal, vertical and longitudinal modes on the string. In addition, there may be important nonlinear effects. Work being done by Julius Smith on string coupling and fitting loop filters to measured data will solve some of the problems. Other work by Scott Van Duyne and Julius Smith will lead to simplifications in modeling the stretching harmonics of the piano string. It may be that work relating to passive nonlinearities by John Pierce and Scott Van Duyne will be helpful for the finest quality tone generation.

The soundboard can now be modeled in a fully physical way using the 2-D Digital Waveguide Mesh, a recent development by Scott Van Duyne and Julius Smith which extends waveguide modeling techniques into two or more dimensions. Julius Smith is working on applying results from his work on bowed strings to piano synthesis; extremely efficient new algorithms are possible using this approach.

The excitation of waveguide string models has till now been left primarily to loading the string with an initial condition and letting it go, or to driving the waveguide loop with an excitation signal tailored to the desired spectral response of the attack transient. While almost any attack transient may be achieved through driving the model with an excitation signal, the variety of interactions that a piano hammer may have with a string is immense when one considers the possibilities ranging over the very different high and low strings, and over the wide range of strike forces. Further, it would be virtually impossible to catalog the possible attack transients due to a hammer hitting a string which is already in motion due to a previous strike. The hammer/string interaction is very complex. Fortunately, recent work initiated by Scott Van Duyne and continued with Julius Smith on modeling piano hammers as wave digital nonlinear mass/spring systems will allow all these complex interactions to fall directly out of the model.

Brent Gillespie's work on the touchback keyboard project will provide a realistic controlling mechanism for the piano synthesis algorithm. The touchback keyboard is a haptic control mechanism, driven by a computer controlled motor. It looks like a piano key, and feels like a piano key. That is, it senses the force applied by a person to the key, and computes, in real time, the correct key motion response based on the equations of motion of the internal key mechanism. It is easy for the touchback keyboard to provide the felt hammer element of the tone synthesis algorithm with a hammer strike velocity. This velocity will be used to drive the synthesis algorithm. In return, the piano hammer element can provide the touchback keyboard with a return hammer velocity at the right time, and the person playing the key will feel the appropriate haptic response.

The hardware and software to implement this complete piano model is available now. The touchback keyboard is controlled by a PC with an add-on card dedicated to real-time computations of the equations of motion. The NextStep operating system running on a Next or PC platform will provide a suitable environment for the synthesis algorithm. Specifically, the SynthBuilder Application being developed by Nick Porcaro and Julius Smith provides a cut-and-paste prototyping environment for real-time DSP-based audio synthesis, and the Music Kit, being maintained and improved by David Jaffe, provides higher-level access to the DSP 56000 card.

There is additional research interest in vibrotactile feedback in the piano keys as suggested in the current work of Chris Chafe. While this effect may be less important in the modern piano, it is certainly more important in early keyboard instruments, and critical in the clavichord, where the hammer may remain in contact with the vibrating string after striking it. Further, we shall want to make the piano sound as if it were somewhere in a particular room or concert hall. Work by John Chowning in localization, by Jan Chomyszyn in loudness perception, by Steven Trautmann in speaker arrays, and by R. J. Fleck in efficient reverberation models can round off the final auditory experience.

Voice Gender Transformation with a Modified Vocoder (May 1996)

Yoon Kim

The goal of this project is to develop a voice transformation system that makes the transformed voice close to a natural voice of the opposite sex. The transformation considers the differences of fundamental frequency (pitch) contours and spectral characteristics.

The transformation algorithm employs components of a vocoder well known as the LPC-10 vocoder. By using the analyzer and the synthesizer of the LPC-10 vocoder and inserting the transformer in between them, we can modify the LPC analysis parameters at the transformer stage, and so change the acoustic nature of the input speech by feeding the modified parameters into the synthesizer.

In converting the gender of a voice, two parameters - pitch and formants- are modified. Pitch is transformed by viewing the pitch as a random variable, and changing the mean and standard deviation of the original pitch values. Formant frequency is defined as the frequency corresponding to a peak of the speech spectrum, while formant bandwidth is defined as the 3-dB bandwidth of the peak. The first three formant frequencies are scaled separately by empirically derived factors. The scale factors for formant bandwidths are set equal to those for formant frequencies.

Based on the above ideas, an algorithm for voice gender transformation is implemented. Its performance depends greatly on the original speaker. Also, female-to-male conversion was found to produce more natural sounding speech compared to male-to-female conversion. This is mainly due to the fact that the LPC-10 vocoder is poor in synthesizing female voice.

A Speech Feature Based on Bark Frequency Warping - The Non-uniform Linear Prediction (NLP) Cepstrum (February 1999)

Yoon Kim

In statistically based speech recognition systems, choosing a feature that captures the essential linguistic properties of speech while suppressing other acoustic details is crucial. This could be more appreciated by the fact that the performance of the recognition system is bounded by the amount of linguistically-relevant information extracted from the raw speech waveform. Information lost at the feature extraction stage can never be recovered during the recognition process.

Some researchers have tried to convey the perceptual importance in speech features by warping the spectrum to resemble the auditory spectrum. One example is the mel cepstrum (Davis, 1980), where a filterbank that has bandwidths resembling the critical bands of human hearing is used to obtain a warped spectrum. Another is the Perceptual Linear Prediction (PLP) method proposed by Hermansky (Hermansky, 1990), where a filterbank similar to the mel filterbank is used to warp the spectrum, followed by perceptually motivated scaling and compression of the spectrum. Low-order all-pole modeling is then performed to estimate the smooth envelope of the modified spectrum.

While the PLP provides a good representation of the speech waveform, it has some disadvantages that should be pointed out. First, since the PLP method relies on obtaining the FFT spectrum before the warping, its ability to model peaks of the speech spectrum - formants - depends on the characteristics of the harmonic peaks for vowels. This could hinder the process of modeling formants of female speech through filterbank analysis, since there are fewer harmonic peaks under a formant region than in the male case. Second, various processing schemes (e.g. Bark-scale transformation, equal-loudness weighting, cubic-root compression) require memory, table-lookup procedure and/or interpolation, which might be computationally inefficient.

We propose a new method of obtaining parameters from speech that is based on frequency warping of the vocal-tract spectrum, rather than the FFT spectrum. The Bark Bilinear Transform (BBT) (Smith, 1995) is first applied on a uniform frequency grid to generate a grid that incorporates the non-uniform resolution properties of the human ear. Frequency warping is performed by taking the non-uniform DFT (NDFT) of the impulse response related to the vocal-tract transfer function using the warped grid. The warped spectrum is then modeled by low-order Linear Prediction (LP), which provides a good estimate of the spectral envelope, especially near peaks. This results in features that effectively model the warped peaks of the vocal-tract spectrum, which are considered to be perceptually important. Results of vowel classification experiments show that the proposed feature effectively captures linguistic information while suppressing speaker-dependent information due to different acoustic characteristics across speakers.

References


next up previous contents
Next: Controllers for Computers and Musical Instruments (Past) Up: Past Research Activities Previous: Computer Music Hardware and Software (Past)
CCRMA CCRMA Overview
©2000 CCRMA, Stanford University. All Rights Reserved.