Difference between revisions of "Mass project"

From CCRMA Wiki
Jump to: navigation, search
(Implementation)
(Implementation)
Line 39: Line 39:
 
=== Implementation ===
 
=== Implementation ===
 
[[Image:yamaha_exp2.png|thumb|GUI Experiment 2|250px|right|GUI Experiment 2]]
 
[[Image:yamaha_exp2.png|thumb|GUI Experiment 2|250px|right|GUI Experiment 2]]
The implementation was donne with the numbers recorded by Hiroko and Jason, mantaining a neutral stress. The user just have to press the space bar when he/she hear voices (numbers). When the experiment is done for one masker, the noise stops and the user presses the next button to go to the next trail.
+
 
 +
**Verbal Instructions:
 +
This is a test where there are only 2 buttons are required, spacebar and (enter)return. You are going to have 2 test runs where you are going to be presented with speech.  When you hear any speech you are to press spacebar immediately after to signal us that you heared speech.  At the end of each cycle a purple bar will light up to let you know the cycle is ready.  You will then press (enter)return to begin the next cycle.   The first 2 trials are to get you used to pushing the buttons in response to speech, data will be recorded at the beginning of the third trial testing if you heared speech. 
 +
 
 +
**Speech used in the experiment where voices by Jason and Hiroko with the intent of neutral stress on vowels.  The words chosen were one, two, three, four, eight which were convolved with impulse response from the tokyo
 +
conference room combined with recorded room noise.
  
 
== Experiment 03 - Intelligibility ==
 
== Experiment 03 - Intelligibility ==

Revision as of 22:02, 23 August 2006

Welcome to the Masking Ambient Speech Sounds project Wiki.

Experiment 01 - Beta Test

GUI Experiment 1

The first listening tests will involve project staff members to check if things make sense. If it looks good we'll start working with non-project volunteers. Experiment 1, in the the CCRMA "Pit," will take about 30 mins. and involve 30 trials. There will be 6 conditions of masking sound crossed with 5 conditions of speech sounds. The masker (FM noise) and the speech sounds will be presented as if the sources are outside the room. We'll use the measured room model from Tokyo and the exterior sound source position (hallway). The "as if" impression will be created by convolving with the measured impulse responses.

Strategies to define conditions for FM masing noise

To define the conditions of this first experiment, the approach will be to leave all the parameters fixed, except the modulation frequency.

Noise set Contains a complete technical documentation of the masking noise generation. It also contains the soundfiles.

The conditions of the masking FM noise will be defined by the following criteria:

  • 3 bands of FM noise will be used (centered at 200 350 and 500 Hz):
    This bands are selected based on an analysis of speech voice recorded in the Tokyo office. The motivation behind this decision is to identify the relevant parameters in the leaking voice. For example, we know that the wall is filtering much of the high frequency components, so that's relevant in the selection of the main frequencies.
  • The amplitude (volume) of each band will be fixed:
    The amplitude was tuned in order to psychoacoustically balance the level of the three noise bands that will be used. This balance was done without modulation.
  • The amplitude of the modulation will be proportional to the modulation frequency:
    The motivation behind this choice is to minimize the annoyance effect. When the modulation rate is low, higher amplitudes are more noticed and annoying.
  • The relation between of modulation frequency of the 3 bands is then the main factor to define the conditions:
    For this experiment, 3 modulation rates are selected, 2, 5 and 7 Hz. The idea is to span some of the frequencies in the range of 2 to 7 Hz. Basically, all the combination of these 3 rates are used for each center frequency, plus a case with no modulation at all.

Findings on the Beta Test

  1. There is a low frequency of the voice that now is not beeing masked.
  2. We need to use a really long conversation, that does never repeat during the experiment.
  3. This corpus of conversations need to have "stationary" properties.

Experiment 02 - Masker Refinement

Experiment design

  • efficiency test
    • Stimuli: speech is mixed at randomized places in a stream of masking noise
    • Task: "hit the space key when you hear a speech"
    • Speech: 5 numbers (one, two, three, four, eight) spoken by a male and a female of different accents. Numbers were chosen so that they cover five vowels.
    • Masker: Genetic algorithm approach with human response. We vary one parameter first and then find one or two "sweet spots." Fix the parameter to those found values and vary the next parameter. Choose the best two - repeat this process.
    • Analysis: Response rate (response rate is low when speech is masked, we expect.) Response time distribution (more response time when speech is better masked, we expect.) Both analyses can be done within-subject and across-subject. We can also observe what kind of speech is better masked with a particular masking noise.

Implementation

GUI Experiment 2
    • Verbal Instructions:

This is a test where there are only 2 buttons are required, spacebar and (enter)return. You are going to have 2 test runs where you are going to be presented with speech. When you hear any speech you are to press spacebar immediately after to signal us that you heared speech. At the end of each cycle a purple bar will light up to let you know the cycle is ready. You will then press (enter)return to begin the next cycle. The first 2 trials are to get you used to pushing the buttons in response to speech, data will be recorded at the beginning of the third trial testing if you heared speech.

    • Speech used in the experiment where voices by Jason and Hiroko with the intent of neutral stress on vowels. The words chosen were one, two, three, four, eight which were convolved with impulse response from the tokyo

conference room combined with recorded room noise.

Experiment 03 - Intelligibility

Experiment design

  • intelligibility test
    • Speech sounds: Idiomatic phrases and isolated words (Each masking sound has a phrase and word) - TBD
    • 4 sec per stimuli (15 stimuli/min)
    • Measure audibility and intelligibility thresholds
    • Better masking noise / parameter region are chosen from the efficiency test.
    • The answers the subject choose from are: "I don't hear speech" "Speech is audible" "Speech is intelligible"
    • complete random order (beyond the group of (1) sentense/words (2) masking noise types and (3) playback level)
    • we prohibit a sequencial presentation of a same stimuli from intelligible to less intelligible. (Same masking noise being loud to quiet)
    • analysis - we do not check the correctness - we only measure the intelligible impression

Implementation

I've programmed up experiment 3. This uses the Santa Barbara corpus clips in a design that produces a percentage measure of masker effectiveness. It's for one masker (the best one arrived at from experiment 2) at a fixed playback level.

Jason has convolved the first SB dialog file, so it plays from the "hallway."

The subject hears a 2 second clip which the app selects randomly from the convolved file. As it's playing the app records the maximum RMS of the first channel of the clip. The subject responds with "yes" or "no" buttons according to whether they heard voices. The app records the response and the maximum RMS played, and then loops, playing the next randomly chosen 2 second clip.

This iterates a whole bunch of times over 5 minutes producing easily 50 trials per subject. The analysis plots the percentage of yes response vs. RMS. We should see a threshold RMS below which the clips were effectively masked.

For the final "efficiency rating" we go back into the convolved dialog file and calculate the percentage of time the signal is below the threshold.

The dialog chosen and start times are as follows:

  • Santa Barbara Corpus Clips Used
    • Each clip is 5 minutes long with the start time indicated below
    • TRACK Start Time
    • sbc0001 0:23
    • sbc0002 0:00
    • sbc0008 0:34
    • sbc0011 0:14
    • sbc015 0:00
    • sbc020 0:00
    • sbc024 0:00
    • sbc025 0:00
    • sbc027 0:00
    • sbc029 0:00
    • sbc048 1:15
    • sbc050 2:17

Experiment 04 - Annoyance

Experiment design

  • annoyance test
    • ten kinds of masking noise, silence, white noise with intruding noise, presented from 4 loudspeakers around.
    • Each one goes on for 30 seconds (or any length) fading in and out for 5 seconds.
    • Fade in the masking noise. Start with the word list, mental math, beep and repeat the word list. Fade out and fade in some enviromental noise (office, traffic, college cafeteria etc.), then next masking noise.
    • Word list is presented to the subject from a loudspeaker in front at 60 dBA.
    • Task: a word list is presented at the start. A subject does mental math for 30 seconds (6-10 questions.) After the beep, the subject has to recall the word list presented at the start. Masking noise switches with fade in/out with an environmental noise. Do the same task with the next masking noise.
  • Final comparison
    • For best 3 masking noises, mix in the typical conference noise (speech, paper shuffle, chair noise, typing sounds, and intruding noise) and ask the subjects which one sounds more "inviting."

FM Masking Noises

Variables

  • modulation width (critical band or speech sounds)
  • modulation rate (0.01 - 0.1 fc)
  • sinusoidal or stochastic modulation

Already fixed

  • with broadband noise (what shape, and how loud? - according to the speech)
  • band width of the noise (critical band)
  • amplitude of each channel (speech sounds spectral distribution)
  • number and frequency of center frequencies (3)

Conference Call Meetings

July 18, 2006

  • FM Modulation discussion (Yasushi's Comments, with Juan-Pablo's comment on answer A:):
  1. Do you have any idea how to specify frequency modulation for each frequency band?
    • A: based on speech freq, ~2-8 Hz
  2. The period in time for each frequency should be the same?
    • A: No, different. When it's the same the masking efficiency decreases. It seems also more anoying.
  3. Modulation speed will be getting faster according to higher frequency, or
    • A: I don't know yet, this is going to be the main parameter in the first experiment I think.
  4. The frequency modulation considering the voice sound
  5. We have to analyze how the voice sound is modulated in different frequency bands?
    • A: I thiks this is the best way, and we have to consider that the wall is filtering almost all the high frequencies.
  • Discussion of the experiment setup.
  • Look at the documentation, the new example of impulse responses, and delay of arrival.

July 24, 2006

Tuesday 9:30AM Japan - Monday 5:30PM Stanford

  • Discuss Experiment 1.
  • Ask Atsuko about calibration files and SPL meeter.
  • Comment diffusion in the Pit with PZM system (Hiroko).
  • Discuss Experiment Design writen by Hiroko and Atsuko.

July 31, 2006

Tuesday 9:30AM Japan - Monday 5:30PM Stanford

  • Discuss Experiment Design writen by Hiroko and Atsuko.
  • Explain experiment setup.
  • Discuss Atsuko's agenda at CCRMA.
  • Goals for this week are to finsih the setup (C++ and pit room) and collect and analyse some data in a couple of subjects.

August 21, 2006

Tuesday 9:30AM Japan - Monday 5:30PM Stanford

August 28, 2006

Tuesday 9:30AM Japan - Monday 5:30PM Stanford

Links