Artificial neural networks provide a flexible environment within which we model the mechanics and implied associated cognitive processes involved in human prediction of time ordered sequential musical elements. We model an experientially trained listener's cognition of functional tonal western music. By interpreting the distribution of output activations of the network as expectations for the next event in the sequence and comparing this to the consequential event, we establish a quantifiable measurement of the degree of realized expectation. The strength and distribution of output activations provide a method for modeling:
We propose to design and implement a series of experiments to investigate these implications and to refine and develop new connectionist architectures to build these models. Initial experiments with a compact representation of a limited number of musical dimensions will be followed by a more flexible representation incorporating all the multidimensionality, complexity, and intricacies of a complete musical work.
The creation of convincing auditory perspective is an important element of computer music; it makes the sound lively and expressive. Many factors contribute to the impression of space and the location of sound sources, including appropriate reverberation, and balance of loudness and timbres of the sounds used in the composition. Some of the parameters which provide cues to distance of the sound sources are correlated in a natural reverberant environment. A typical example is direct-to-reverberant sound energy ratio and intensity, which change reciprocally along the physical distance between the sound source and the listener.
However, the percepts arising from the physical cues do not always follow the same relationship. This is easy to show in the visual world, in the case of size constancy. In visual perspective, to preserve the impression of size constancy of an object, the physical size of the object has in fact to be diminished in proportion to the provided perspective. Is this also the case in auditory perspective? Existing evidence seems to confirm this thesis.
Since the beginning of this century researchers are aware that changes in loudness and changes in distance may sometimes form equivalent concepts for the listeners (Gamble 1909). As a part of his ``physical correlate theory," Warren noticed that loudness judgements of his stimuli (speech) depended on the degree of reverberation (Warren 1973). Recently Chowning (Chowning 1990) observed that loudness constancy takes place in room environment in a way analogous to the size constancy in vision. An experiment being carried out investigates this postulate with regard to computer music.
Dry percussive sounds have been produced by using the physical model of a hammer (Van Duyne 1994), simulating varying effort of the player. Next the sounds have been next reverberated at a level corresponding to changing distance in a room. In the test, subjects match dry prototypes to each of the reverberated sounds. They are also being given an auditory perspective of the room before each trial. Care has been taken to eliminate possible influence of the spectral bandwidth on the loudness match. The test will reveal if distant sounds played with greater effort are perceived as louder, as if the loudness were estimated at the sound source.
The lectures from CCRMA's Music 151 course, ``Psychophysics and Cognitive Psychology for Musicians'' are now published as:
This introductory text on psychoacoustics, specifically as it relates to music and computerized sound, emerged from a course that has been taught for many years at Stanford University's Center for Computer Research in Music and Acoustics (CCRMA). Organized as a series of 23 lectures for easy teaching, the book is also suitable for self-study by those interested in psychology and music. The lectures cover both basic concepts, and more advanced concepts illuminated by recent research. Further aids for the student and instructor include sound examples on CD, appendixes of laboratory exercises, sample test questions, and thought problems. The contributors, leading researchers in music psychology and computer music, John Chowning, Perry Cook, Brent Gillespie, Dan Levitin, Max Mathews, John Pierce, and Roger Shepard.
The goal of our research was to evaluate the performance of different psychoacoutic models for simultaneous masking by inserting them into an existing audio coding framework. Several approaches for creating individual maskers and adding maskers were studied. The two models of individual maskers employed were derived from work by Patterson et. al. and Zwicker. The models for additivity of individual maskers considered were maximum masker, intensity addition, and nonlinear addition (i.e. modified power-law as proposed by Lutfi). To evaluate these subtle differences, we created a Tcl/Tk script to allow us to easily run a subjective ITU-R style blind listening test.
Broadly speaking, my research is concerned with the psychology of structure and perceptual organization. How does the brain organize the world around us, create categories, and parse a dense perceptual field? To answer these questions, I have been examining principles of visual and auditory perception (how the brain groups basic elements into objects).
More specifically, my current research projects include work on:
For more information, please see http://www-ccrma.stanford.edu/~levitin/research.html.
Given a musical recording of an ensemble, for example a rock band with drums, guitar, and vocals, enthusiasts or engineers might want to obtain just the guitar, just the drums, or just the vocals. This goal, in which one obtains the resynthesis of the component sounds of a mixture signal, when initially given only the combined one- or two-channel signal, is called Sound Source Separation. There are several methods for doing this, though they may be generally divided into "data driven" and "model driven." Methods of both these types are often based on a mimicking of the response of the human auditory system. Though the goal and some general approaches are well-defined, current research has achieved impressive results only when highly constrained. Systems recently presented by Ellis, Klapuri and Virtanen, and Kashino and Murase, will be discussed, as will future directions being pursued at CCRMA and elsewhere.
Since 1960, computer control of fundamental pitch has opened the door to musical experiments with arbitrary intonation systems. Researchers like M. Mathews and J. R. Pierce have investigated the psychoacoustical and musical implications of some possible synthetic intonation systems. In my book, Expanded Tunings In Contemporary Music, the class of all equally-tempered intonation systems is examined from several points of view (mathematical, psychoacoustical, compositional, etc.).
It was found that among the infinitely many possible intonation systems, there is one especially interesting subclass case designated as ``expanded tunings.'' Expanded tunings are equal systems where the prime interval of the tuning, corresponding to its exponential base, is an interval of the harmonic series other than the unison or the octave and its multiples (intervals termed by J.R. Pierce as ``Superconsonant Ratios'').
It was also found that the perceptual and cognitive coherence of a tuning system depends on the perception of octave similarity, or pitch class abstraction, and that expanded tunings, being based not on octaves but on other superconsonant ratios, could not be perceptually coherent. These would be purely theoretical entities unless the human perceptual/cognitive apparatus could perceive or be trained to perceive prime interval similarity, or pitch class abstraction, in a way similar to that which is obtained by octaves in octave-based tunings.
It was finally found that current explanations for octave similarity support the theoretical possibility that other superconsonant ratios could elicit similar responses of similarity perception under certain special structural and musical circumstances, and provided with a meaningful musical context.
This conclusion pointed to the next step of this research by establishing the need for carefully acquired experimental evidence that will help musicians and music researchers to understand the problem of whether the human mind is capable of expanding its perceptual and cognitive abilities in finding pitch-class similarity beyond the octave.
This research examines both theoretical and empirical backgrounds to the problem, and provides an original theoretical model (based on R. Shepard's cognitive models) in order to establish an appropriate framework for conclusions. A series of cognitive experiments, their methodology, assumptions and expected interpretations as well as the techniques used to obtain results are being presently researched.
This research will contribute towards a renewed awareness of the relation between tuning schemata and music cognition, and in particular to the cognitive coherence of theoretical tuning systems in a real musical context. The proposed new model will hopefully provide an improved way to understand these relationships. Finally, it is expected that the proposed set of experiments will provide answers that contribute to our understanding of the main issue of this research, i.e., whether human cognitive mechanisms can be expected to perceive or trained to perceive, under restricted musical circumstances, similarities other than the octave with the same cognitive effects as those elicited by the octave.
The development of multimedia works in recent years has gone hand in hand with the development of technology. It has become easier to create such works using not only computer workstations but PCs as well. As the number of TV channels increases, as well as other media such as film, video, PC-Game, and CD-ROM, more materials are needed.
Multimedia works consist of pictures and sound/music. But they often have been treated as the secondary to pictures. The pictures exist first and then composers write music to fit to them. Music and sound in multimedia works have a great effect on their expression. They add more meanings than the picture itself and/or even change the stream of story compared to that without sound.
Two experiments explored audiovisual interactions when perceiving 3 patterns of matching. For the mismatched excerpts, the original relation between the audio and visual tracks was altered with respect to time or content. For the higher level factor, comparison of the results for matched and mismatched conditions implied an intention of balancing audio and visual meaning. Besides, audio meaning had a direct influence on visual meaning, but only for matched stimuli. For the lower level factor, the influence was independent of degree of matching but feeling of time. The study of factor analysis revealed several kinds and levels of audiovisual interaction.
I focused on the progression of both pictures and music. Music used in audiovisual context has an emotional curve that is related to motion and colors of pictures when it is composed as BGM. Audio and visual context will make stream hand in hand to complete the whole works. Using the software ``humdrum'' on UNIX to analyze the melody pattern, music, sound and action of pictures are arranged in the same time table. Then there I find the relation between audio and visual materials.
The writer was invited to give a talk for "Fletcher Day" (2 June 1995) at the Washington D.C. meeting of the Acoustical Society of America, 30 May - 3 June, 1995. After examining Fletcher's publications, he chose to talk on Fletcher's discoveries concerning pitch. This topic finds no place in Fletcher's book Speech and Hearing in Communication (1953), recently republished by the Acoustical Society of America, and seems little known. The talk will be published in the Journal of the Acoustical Society of America.
Summarizing briefly, in papers in the Physical Review in 1924, ``The Physical Criterion for Determining the Pitch of a Tone,'' Phys. Rev. 23(3), 427-437, and in ``Some Further Experiments on the Pitch of Musical Tones,'' Phys. Rev. 23, 117-118, Fletcher showed that actual musical tones and synthesized tones can have the pitch of the fundamental in absence of any component of the fundamental frequency. Further, Fletcher found that pitch in the absence of the fundamental occurs only when three successive harmonics are present.
The discovery of pitch in the absence of the fundamental is sometimes attributed to J.F. Schouten, ``The Residue, a New Component of Subjective Sound Analysis,'' K. Ned. Akad. Wet, Proc. 43, 356-465, 1940. Schouten's work and those following him treats pitch in the absence of the fundamental as a separate phenomenon, rather than a common characteristic of pitchiness. Fletcher's work on pitch was completed by the time Schouten published.
Fletcher's papers on pitch have led to changes in the parts of the treatment of pitch in a book being prepared for publication by Perry Cook, based on lectures given as a part of Music 151, Psychophysics and Cognitive Psychology for Musicians.
Further consideration of Fletcher and pitch continues.
The human auditory system possesses a remarkable ability to differentiate acoustic signals according to the vibrational characteristics of their underlying sound sources. Understanding how listeners can detect, discriminate, classify, and remember acoustic source properties forms this project's long-range goal. The present project brings to bear on these topics techniques of psychophysical measurement, spectral analysis/synthesis techniques, and computer simulation of acoustic objects. Using such interdisciplinary approaches, studies will determine the validity of a three-stage model of auditory source perception:
Using methods of signal detection, preliminary studies will determine how listeners' sensitivity to auditory signals depends on whether attention is first directed to their acoustic features, and how sensitivity may improve as a function of the available source cues. Additional studies will use physical modeling and spectral simplification techniques to determine which acoustic features are critical to detection performance. A fundamental problem in auditory perception is to understand how listeners can perceive a sound source to be constant across wide variations in the range of sounds that the source can produce. Consequently, a separate set of studies will use adaptation techniques to determine how listeners categorize sounds by their source characteristics, and to assess whether computer-generated prototypical sources - sources, such as bars, tubes, and plates, that define broad classes of sound-producing objects - are classified more rapidly and accurately than non-prototypical sources. Our ability to recognize previously heard sounds suggests that we encode features of acoustic sources in memory. A related set of experiments will use recognition and recall tasks to determine what features of sounds are encoded in working and long-term memory, and whether memory representations encode a sound's surface spectral-temporal features or its underlying physical source characteristics.
In sum, this research program should shed important light on the representation of auditory source characteristics by determining the stages of processing that auditory information undergoes from its initial encoding at peripheral levels to its source-based representation at more central levels. Not only can this improve our basic understanding of auditory processing but also can suggest ways in which humans can optimize their performance in detecting and evaluating signals of interest within their acoustic environment.
This research explores the possible application of recent developments in the transcription and study of ``intonation'' in linguistics to music theory and analysis. In the field of linguistics, ``intonation'' refers to the ``melody'' of an utterance, including such characteristics as pitch, stress, accent, and phrasing.
In recent research by Pierrehumbert (1980) and Beckman and Elam (1997), among others, an intonation transcription method known as ToBI (Tone and Break Indicies) has been developed and codified. This system has become essentially a standard transcription technique for English dialects.
Using the basic foundation of ToBI transcription, I researched possible applications of the theory to musical analysis and perception. Strengths and weaknesses of the application of the theory to music were explored, as well as potential limitations to applicability, including stylistic elements and genre.
|© Copyright 2005 CCRMA, Stanford University. All rights reserved.|