Hannes Muesch - Speech Intelligibility
Date:
Fri, 03/10/2023 - 10:30am - 12:00pm
Location:
CCRMA Seminar Room
Event Type:
Hearing Seminar 
Going all the way back to the start of the telephone network people have been interested in which information is critical for conveying a speech signal. Harvey Fletcher (hi Jont) studied the perception of speech as a function of bandwidth. This led to his Articulation Index, which, Poppy Crum recently suggested at CCRMA, limited high-frequencies and disadvantaged women’s voice in communications. The idea of a model that predicts speech intelligibility was formalized as a standard in the 1990s and was known as the Speech Intelligibility Index (SII). Chas Pavlovic was one of the developers and he will be in attendance. Most recently, our speaker this week, Hannes Muesch, developed the speech recognition sensitivity model, which uses statistical decision theory to model how well can people perceived speech.
This represents a full century of modeling our ability to correctly hear speech. Our discussion will be much shorter, but still highly intelligible.
Who: Hannes Muesch (Dolby)
What: Speech Intelligibility
When: Fri, 03/10/2023 from 10:30am - 12:00pm
Where: CCRMA Seminar Room
Why: Because listening to and understanding speech is really important
We have a tradition of high-quality discussions at the Hearing Seminar. How do we talk about and measure the quality of a speech signal? See you at CCRMA!
- Malcolm
In this meeting we will review the oldest class of speech intelligibility models: Those that make predictions from only the power spectra of speech and maskers. These models are very simple by modern standards but have had, and continue to have, a substantial impact on the design of communication equipment and in hearing health care. We will review the Articulation Index in its various incarnations and will discuss a now 20-year-old model that accounts for intelligibility test results that the Articulation Index models cannot explain.
Dr. Muesch introduces a new model that predicts speech intelligibility based on statistical decision theory. This model, which we call the speech recognition sensitivity (SRS) model, aims to predict speech-recognition performance from the long-term average speech spectrum, the masking excitation in the listener's ear, the linguistic entropy of the speech material, and the number of response alternatives available to the listener. A major difference between the SRS model and other models with similar aims, such as the articulation index, is this model's ability to account for synergetic and redundant interactions among spectral bands of speech. In the SRS model, linguistic entropy affects intelligibility by modifying the listener's identification sensitivity to the speech. The effect of the number of response alternatives on the test score is a direct consequence of the model structure. The SRS model also appears to predict the differential effect of linguistic entropy on filter condition and the interaction between linguistic entropy, signal-to-noise ratio, and language proficiency.
Bio: Hannes Muesch got his doctorate with Soren Buus and Mary Florentine at Northeastern University working on loudness perception and speech intelligibility. That work resulted in the SRS intelligibility model, which will be discussed in this presentation. He has worked at both GN ReSound and Sound ID and is now with Dolby.
See for example this paper.
FREE
Open to the Public