Pierre Divenyi on Phonetic Restoration

Fri, 04/26/2013 - 1:15pm - 2:30pm
CCRMA Seminar Room
Event Type: 
Hearing Seminar

    aaaaaaaaa BUZZ aaaaaaaaaaaaa


If the buzz is loud enough, people will hear a continuous vowel sound. This is known as phonetic restoration.  Al Bregman (author of the tome Auditory Scene Analysis) says that this is an example of old-plus-new. I suspect it is an example of top-down influences.  All examples of the same process that help us understand the auditory world around us.


Pierre Divenyi will be reviewing the literature and talking about experiments he has done on phonetic restoration.  How is it that we can hear sounds that are not there?  How do our brains parse the world, even in the face of loud noises that obfuscate the information we want to hear?

Who:  Pierre Divenyi (CCRMA)
What: Phonemic restoration with an articulatory twist
When: Thursday April 26th at 1:15PM
Where: CCRMA Seminar Room (Top Floor of the Knoll)
Why: Because its magic :-)


Don't miss out... phonetic restoration is powerful, but it's hard to restore a whole missing hour at the Hearing Seminar.


- Malcolm




CCRMA Hearing Seminar April 26, 2013


Pierre Divenyi


Phonemic restoration with an articulatory twist




“Phonemic restoration” – perceptual restoration of speech segments replaced by random noise – was first recognized by Warren in 1970 and investigated, off and on, by experimental psychologists, phoneticians, audiologists, and computational specialists. It appears, however, that its mechanisms have largely remained hidden to date. It is generally assumed that it is some top-down stream (generating either correct or incorrect responses) that fills in the missing segments, but the nature of this stream is unknown. Is it process finding words or phonemes that make sense given the context, or is it a flow of some underlying basis functions from which the missing segment is reconstituted at a higher center? To tackle this puzzle, we first made a list of “spondees” by concatenating pairs of long-vowel monosyllabic words (a small portion of which included true spondee words), and then replaced their middle portion with either silence, or flat-envelope Gaussian noise, or the same noise modulated by the envelope of the excised speech, or a hum (a low-pass sawtooth wave) having the f0 contour of the excised segment. Listeners were told that the original stimulus was always a pair of English words and were asked to type what they thought the pair was. Stimulus and response pairs were aligned phonemically to generate phonemic and phonetic-feature confusion matrices. They were also resynthesized using the Haskins Labs TaDA (task dynamic application) articulatory synthesizer built on eight gesture functions. The synthesized waveform pairs were time-aligned and the distance of the gesture function pairs computed. The data show that for all four fillers phonemic and phonetic feature errors were about one order of magnitude larger than gesture distances. The results therefore support a theory according to which speech is perceived by decomposing it into some set of sparse basis functions. One such set would consist of articulatory gestures and we perceive speech by having our brain recover these gestures – a process that has been actually substantiated by neuroscientific observations.



Profile – Pierre Divenyi, Consulting Professor, CCRMA, Department of Music

Pierre Divenyi started his career as a pianist, giving recitals in Europe and the US. As a graduate student at the University of Washington, his interests turned toward science and obtained his doctorate in systematic musicology, writing a thesis on the perception of rhythm in micro-melodies. That work led him to studies on the psychoacoustics of tone sequences, time intervals, and auditory localization. He worked as a researcher first at Central Institute for the Deaf in St. Louis and then at the Martinez (CA) VA Medical Center’s Speech and Hearing Research Laboratory, which he directed from 1979 to 2012. His research over the past two decades has been focused on auditory scene analysis and, in particular, on the auditory processes underlying the separation of speech from background noise – the so-called “Cocktail-party effect” – and its dysfunction in aging. In search of a solution to this problem, he organized several international multidisciplinary meetings that brought together psychoacousticians, auditory neuroscientists and computer scientists. Over the last decade, he edited two books on empirical and computational research on speech separation and was recipient of several collaborative interdisciplinary grants to study the issue. He has been visiting professor and visiting scientist at several universities in the US, Canada, France, and Germany, and has been frequently invited to lecture both in the US and abroad. He joined CCRMA as a Consulting Professor in early 2012.


Open to the Public
Syndicate content