Mass project

From CCRMA Wiki

(Difference between revisions)
Jump to: navigation, search
(Experiment design)
m
 
(44 intermediate revisions not shown)
Line 1: Line 1:
Welcome to the Masking Ambient Speech Sounds project Wiki.
Welcome to the Masking Ambient Speech Sounds project Wiki.
-
== Experiments ==
+
== Project Summary ==
-
=== Beta Test ===
+
# Recording and diffusion methodologies - testing and implementation
 +
#* Comparison between PZM (4channel) and Sound Field (4 directional)
 +
#* Decision taken for the Sound Filed, because it gives a better space image.
 +
#* Diffusion in a semi-anechoic room, using a 4channel setup.
 +
# Processing from offices recordings
 +
#* Tokyo office recording of all the sounds necessary for Calibration and for Impulse response generation.
 +
#* Processing of impulses responses inside and outside the room
 +
#* Pink noise calibration recording
 +
#* Room calibration through a generalized equalization methodology, using omni microphone recordings in the real room and in the simulated room as comparison.
 +
# FM masker generation
 +
#* Exploration of different strategies to follow (sinusoidal versus random modulation)
 +
#* Critical band (ERB) width of noise bands.
 +
#* Decision taken for using 3 bands, with random walk modulation (way less annoying than sinusoidal modulation).
 +
# Experiment design and implementation
 +
#* Experiment 01
 +
#*: Beta experiment to test system setup (C++ implementation) for real time experiment with automatic data retrieval. Also, fine tune of the experiment psychoacoustic design.
 +
#* Experiment 02
 +
#*: Masker Refinement through a general purpose process in which the best candidates are being selected while the worst are being discarded. This is achieved by varying one parameter at the time, and then moving to the next stage with the best candidate for that parameter, and moving another parameter. For the FM masker, the parameters where Center Band Frequency (3 bands), Band Amplitude, Modulation Rate (for each band), and Amplitude of the Modulation (for each band).
 +
#* Experiment 03
 +
#*: Efficiency, uses Santa Barbara corpus of conversation, in which for 1 masker (that is on throughout the whole experiment) random parts of the conversation are presented and asked if they are either heard or not. The RMS of the random part is recorded, as well as the answer of the subject.
 +
#* Experiment 04
 +
#*: Annoyance, in design process
 +
#Spatialization study
 +
#: Study of special variables in the direccionality of the masker. Generation of a “virtual impulse response” in which the sound (masker) comes from outside the room (where the intruding sound is located) but the filtering effect of the wall is removed.
-
The first listening tests will involve project staff members to check if things make sense. If it looks good we'll start working with non-project volunteers. ''Experiment 1'', in the the CCRMA "Pit," will take about 30 mins. and involve 30 trials. There will be 6 conditions of masking sound crossed with 5 conditions of speech sounds. The masker (FM noise) and the speech sounds will be presented as if the sources are outside the room. We'll use the measured room model from Tokyo and the exterior sound source position (hallway). The "as if" impression will be created by convolving with the measured impulse responses.
+
== How to setup and calibrate Tascam 3200 mixer ==
-
Necessary ingredients: ('''x''' = done)
+
** Detailed instructions
-
# '''(x)''' ambient room sound recording from Tokyo                             
+
-
# '''(x)''' 15 sec. recordings of FM noise masker with parameter variation
+
-
# '''(x)''' 4 min. recordings of 4 conversations (animated / not-animated, crowd / pair, always 50% gender balance)
+
-
# '''(x)''' 15 sec. clips cut from conversations
+
-
# '''(x)''' convolved versions of 15 sec. files putting them "as if" in the hallway
+
-
# '''(x)''' GUI for running randomized listening, A/B forced choice, logging results
+
-
[[Image:exp1GUI.png]]
+
*1)Start hdspmixer & hdspconf(all settings automatic)
 +
*2)Type in terminal cd /usr/bin/  then cpufreq-selector –g performance (sets maxcpu)
 +
*3)Open jack
 +
          a.Set Frames/Period to 1024
 +
          b.Set Sample Rate to 44100
 +
          c.Set Interface to RME Hammerfall
 +
 
 +
Mixer config :
 +
*4)Equalizing levels and linking channels
 +
        a.Under “SCREEN MODE/NUMERIC ENTRY” click “METER.FADER”
 +
            i.Under tab “CH FADER” set gain levels “CH 1-18” equal
 +
            ii.Under tab “Master M/F” set bus levels “BUSS 1-16” equal
 +
        b.Under “SCREEN MODE/NUMERIC ENTRY” click “ALT-LINK/GRP”
 +
            i.Click “SEL” for channel 1 followed by 2, 3, and 4
 +
            ii.Double click tab “GROUP ON/OFF”
 +
            iii.Click down curser to set the next grouping
 +
            iv.Click “SEL” for channel 5 followed by 6,7, and 8
 +
*5)Setting the speakers for surround sound
 +
        a.Click “SEL” for channel 1
 +
            i.Under “OUTPUT ASSIGN” select “1”
 +
            ii.Make sure “STEREO” and “DIRECT” are unchecked
 +
        b.Click “SEL” for Channel 2
 +
            i.Under “OUTPUT ASSIGN” select “3”
 +
            ii.Make sure “STEREO” and “DIRECT” are unchecked
 +
        c.Repeat this process for the following combinations
 +
            i.Ch1:1, CH2:3, CH3:5, CH4:7, CH5:13, CH6:14, CH7:15, CH8:16
 +
            ii.Channels 1-4 are head-level and Channel 5-8 are above
 +
*6)Set up I/0 (if is already screwed up)
 +
        a.Click “ALT_ROUTING” and click “INPUT”
 +
            i.Set CH1 to adat-1, CH2 to adat-2, etc…
 +
            ii.If you want to set up a record line, do so setting CH9 to M/L 9
 +
                  1.Set the top knob and switch to appropriate setting
 +
                  2.Use CH9 fader to set input level to application
 +
        b.Click “ALT-ROUTING” and click “OUTPUT SLOT” for output cards
 +
            i.Slot A set Trk1-8 to BUSS 1-8 in sequential order (Horizontal)
 +
            ii.Slot B set Trk1-8 to BUSS 9-16 in sequential order (Vertical)
 +
Software Config:
 +
 
 +
*7)Setting up the software with hardware
 +
        a.Go to Application under Bash shell and type “m”, then “make”, then “go”
 +
        b.Play Voice recording and set levels to 25dbA at center
 +
        c.Play Masker noise and set levels to 45dBa at center
 +
*8)Go to “MAIN DIALOG” in software app to set ID & output dir then Repeat 6a
 +
 
 +
 
 +
== Experiment 01 - Beta Test ==
 +
 
 +
[[Image:exp1GUI.png|thumb|GUI Experiment 1|250px|right|GUI Experiment 1]]
 +
The first listening tests will involve project staff members to check if things make sense. If it looks good we'll start working with non-project volunteers. ''Experiment 1'', in the the CCRMA "Pit," will take about 30 mins. and involve 30 trials. There will be 6 conditions of masking sound crossed with 5 conditions of speech sounds. The masker (FM noise) and the speech sounds will be presented as if the sources are outside the room. We'll use the measured room model from Tokyo and the exterior sound source position (hallway). The "as if" impression will be created by convolving with the measured impulse responses.
-
==== Strategies to define conditions for FM masing noise ====
+
=== Strategies to define conditions for FM masing noise ===
To define the conditions of this first experiment, the approach will be to leave all the parameters fixed, except the modulation frequency.
To define the conditions of this first experiment, the approach will be to leave all the parameters fixed, except the modulation frequency.
Line 33: Line 98:
*: For this experiment, 3 modulation rates are selected, 2, 5 and 7 Hz. The idea is to span some of the frequencies in the range of 2 to 7 Hz. Basically, all the combination of these 3 rates are used for each center frequency, plus a case with no modulation at all.
*: For this experiment, 3 modulation rates are selected, 2, 5 and 7 Hz. The idea is to span some of the frequencies in the range of 2 to 7 Hz. Basically, all the combination of these 3 rates are used for each center frequency, plus a case with no modulation at all.
 +
=== Findings on the Beta Test ===
-
--[[User:Jcaceres|Jcaceres]] 17:09, 24 July 2006 (PDT)
+
# There is a low frequency of the voice that now is not beeing masked.
 +
# We need to use a really long conversation, that does never repeat during the experiment.
 +
# This corpus of conversations need to have "stationary" properties.
-
==== Beta test TODOs ====
+
== Experiment 02 - Masker Refinement==
 +
=== Experiment design ===
 +
* efficiency test
 +
** Stimuli: speech is mixed at randomized places in a stream of masking noise
 +
** Task: "hit the space key when you hear a speech"
 +
** Speech: 5 numbers (one, two, three, four, eight) spoken by a male and a female of different accents. Numbers were chosen so that they cover five vowels. 
 +
** Masker: Genetic algorithm approach with human response. We vary one parameter first and then find one or two "sweet spots." Fix the parameter to those found values and vary the next parameter. Choose the best two - repeat this process.
 +
** Analysis: Response rate (response rate is low when speech is masked, we expect.) Response time distribution (more response time when speech is better masked, we expect.) Both analyses can be done within-subject and across-subject. We can also observe what kind of speech is better masked with a particular masking noise.
-
The beta-test of the experiment tool took longer than anticipated. Some minor fixes remain. The ones I remember from yesterday (Friday, 28th) and the ToDo list for Monday ('''x''' = done):
+
=== Implementation ===
-
#  '''(x)''' delete input slider from bottom of GUI (in Qt Designer), final product should look like the picture above
+
[[Image:yamaha_exp2.png|thumb|GUI Experiment 2|250px|right|GUI Experiment 2]]
-
#  '''(x)''' when user hits "OK, Next" button, clear all the radiobuttons, with radiobutton->setDown(false)
+
-
#: This worked out with setChecked(false) (inside a method, not in the connection)
+
-
#  '''(x)''' comment out all the "cout" statements that are printing during trials, except for the one that says "behind"
+
-
#  '''(x)''' find a sticky way to keep machine speed at max during trial (automatic energy saving may be the reason for the occasional stuttering)
+
-
#: Jason comments:
+
-
#:  /usr/bin/cpufreq-selector -g performance
+
-
#: you will select the "performance" governor and the cpu speed should go to the max and stay there.
+
-
#: /usr/bin/cpufreq-selector -g userspace
+
-
#: will return the governor to the original "userspace" governor.
+
-
#: And:
+
-
#: /usr/bin/cpufreq-selector -f 1000000
+
-
#: will get the processor to the slow idle speed.
+
-
#: From there the speed should again be "on demand". Regretfully it looks like sometimes the background daemon ("cpuspeed") gets fooled by these changes and dies. At least you can control all of this manually.
+
-
#  '''(x)''' convert QString to const char for logger class file open (use const char * QString::latin1 ())
+
-
#  '''(x)''' create a "shuffle" sort method in MainDialog.cpp and apply it for the actual first test
+
-
#  I think each individual mono file repeating is ok, but I'm worried that they could slip out of sync. Don't know for sure. Better if the repeats for a group of four is from the first channel's repeat
+
-
#  add envelopes at all file starts, stops, repeats (with STK's Asymp class), pipe the file's output through it
+
-
#: I still need to add this, but I think after the first experiments, what we really need to do is make much longer files so they don't repeat, and the listener don't get a queu from that repetition.
+
-
# IF I've created a problem for disk files keeping up, you will see the message "behind" printed from FileWvIn and it will start stuttering, the next fix to try (and this might be important anyway for our sanity) is to go to quad files rather than 4 mono files for each layer.
+
-
#: This doesn't look easy, I think I have to modify the entire Jukebox.cpp class in order to get this working...
+
-
# '''(x)''' Add a dialog in case the user doesn't select an option.
+
-
# '''(x)''' Change the silence always on A. Modify also the "correctness" of the selection, now is always set to be in A.
+
-
# '''(x)''' Turn off Sounds (alternative A and B) when user goes to next case.
+
-
# '''(x)''' Program is crashing at the end (it's quiting badly). If you go until the end, is not writing anything to the ouput files. If you stop it in the middle, it works. I may be probably a problem with some destructor...
+
-
#: I get the problem with these test files:
+
-
#: /usr/ccrma/snd/jcaceres/yamaha/recordings/experiments/experiment01/TEST/
+
-
#: the message is:
+
-
#: terminate called after throwing an instance of 'std::bad_alloc'
+
-
#:  what():  St9bad_alloc
+
-
#: Aborted
+
-
#:
+
-
#: It works fine with these set of files:
+
-
#: QString rootDir ("/usr/ccrma/snd/jcaceres/yamaha/recordings/experiments/experiment01/");
+
-
#: FOUND IT!!! It was a problem reading a vector in MainDialog::setTrial (int n)
+
 +
*Verbal Instructions:
 +
This is a test where there are only 2 buttons are required, spacebar and (enter)return.  You are going to have 2 test runs where you are going to be presented with speech.  When you hear any speech you are to press spacebar immediately after to signal us that you heared speech.  At the end of each cycle a purple bar will light up to let you know the cycle is ready.  You will then press (enter)return to begin the next cycle.  The first 2 trials are to get you used to pushing the buttons in response to speech, data will be recorded at the beginning of the third trial testing if you heared speech. 
 +
*Speech used in the experiment where voices by Jason and Hiroko with the intent of neutral stress on vowels.  The words chosen were one, two, three, four, eight which were convolved with impulse response from the tokyo conference room combined with recorded room noise.
-
_____________
+
=== Post Experiment Subject Interviews ===
-
there are probably more things I'm forgetting, but this is close <br/>
+
*Phase01:
-
GOOD LUCK!
+
This test had the most diversity in types of sounds.  Since some maskers were not effiient, subjects learned about rhythem of speech presented. Subjects clearly described how some sounds worked better in masking then others since they had an idea of how many sounds were coming at what rate for each masker.  Subjects enjoyed this test because differences in maskers were clear.
 +
*Phase02:
 +
Out of the bunch of 27 maskers we picked 2 candidates for our "golden masker."  For this test we changed the amplitude of different center frequencies for these 2 maskers which gave very different sounds throughout the test.  Some subjects found that sounds were noticably much harsher and annoying to listen to then others.  Several subjects defined that for one masker, it worked really well in masking and sounded like being on an airplane.  Subjects still enjoyed this test because differences in maskers were clear.
 +
*Phase03
 +
At this point we chose 1 masker and used different frequencies of modulation.  Most subjects described the sound as droning meaning that it entranced or hypnotized them.  This had an effect on most subjects who described the latter half of the test more difficult for them to concentrate.  Some subjects claim to almost fall asleep making it difficult to give consistent answers.  As I administered the test, I even noticed the sleepy feeling every single time so I started leaving the room during the test.  Subjects said that they could hear the female voice very clearly when they would click spacebar (although they would miss more female speech overall).  For the male voice that would come through, they would listen for the deep male voice that sounded like short spurts of "wha" and "woo."  For the most part, subjects were hitting spacebar when there was speech and not hitting spacebar when they did not hear it as expected from the subjects that I did observe.     
 +
*Phase04
 +
Most subjects described the sound as droning meaning that it entranced or hypnotized them as well.  This made sense since we kept the same basic sounds but would change the frequency modulation amplitude.  The main difference in this test as I would observe subjects is that they would push spacebar repeatedly when there would be no sound presented.  This seem to be due to the fact that the 4 speakers above playing the masking sound is uncorrelated and getting random interference patterns.  I assume that the sounds that were generated have a interference pattern that was comparable to the speech used ultimately confusing the listener.  This effect played a role on all subjects that I observed and I let them continue pushing spacebar throughout the test.  Some felt test was too long because they were falling asleep.
-
--[[User:Cc|Cc]] 09:42, 29 July 2006 (PDT)
+
== Experiment 03 - Efficiency ==
-
==== Findings on the Beta Test ====
+
=== Implementation ===
-
# There is a low frequency of the voice that now is not beeing masked.
+
I've programmed up experiment 3. This uses the Santa Barbara corpus clips in a design that produces a percentage measure of masker effectiveness. It's for one masker (the best one arrived at from experiment 2) at a fixed playback level.
-
# We need to use a really long conversation, that does never repeat during the experiment.
+
-
# This corpus of conversations need to have "stationary" properties.
+
 +
Jason has convolved the first SB dialog file, so it plays from the "hallway."
-
==== Bottom lines ====
+
The subject hears a 2 second clip which the app selects randomly from the convolved file.
 +
As it's playing the app records the maximum RMS of the first channel of the clip.
 +
The subject responds with "yes" or "no" buttons according to whether they heard voices.
 +
The app records the response and the maximum RMS played, and then loops, playing the next randomly chosen 2 second clip.
-
# We're going to use just one room (Tokyo Office)
+
This iterates a whole bunch of times over 5 minutes producing easily 50 trials per subject.
-
# We keep the 4CH setup.
+
The analysis plots the percentage of yes response vs. RMS. We should see a threshold RMS below which the clips were effectively masked.
-
# Spatialization ???
+
 
 +
For the final "efficiency rating" we go back into the convolved dialog file and calculate the percentage of time the signal is below the threshold.
 +
 
 +
The dialog chosen and start times are as follows:
 +
 
 +
* Santa Barbara Corpus Clips Used
 +
Each clip is 5 minutes long with the start time indicated below.  The trackes were normalized then tuned to the appropriate dbFS level in relation to each other to be in the acceptable threshold level for experimentation.
 +
 +
*TRACK       /Start Time      /dbFS
 +
#sbc0001         /0:23        /-22.1
 +
#sbc0002         /0:00        /-9.1
 +
#sbc0008         /0:34        /-4.8
 +
#sbc0011         /0:14        /-2.3
 +
#sbc015  /0:00        /-1.9
 +
#sbc020   /0:00        /-4.4
 +
#sbc024 /0:00        /-4.1
 +
#sbc025 /0:00        /-3.5
 +
#sbc027 /0:00        /-7.0
 +
#sbc029 /0:00        /-5.7
 +
#sbc048 /1:15        /-0.8
 +
#sbc050 /2:17        /-6.5
 +
 
 +
== Experiment 04 - Annoyance ==
 +
=== Experiment design ===
 +
* annoyance test
 +
** ten kinds of masking noise, silence, white noise with intruding noise, presented from 4 loudspeakers around.
 +
** Each one goes on for 30 seconds (or any length) fading in and out for 5 seconds.
 +
** Fade in the masking noise. Start with the word list, mental math, beep and repeat the word list. Fade out and fade in some enviromental noise (office, traffic, college cafeteria etc.), then next masking noise.
 +
** Word list is presented to the subject from a loudspeaker in front at 60 dBA. 
 +
** Task: a word list is presented at the start. A subject does mental math for 30 seconds (6-10 questions.) After the beep, the subject has to recall the word list presented at the start. Masking noise switches with fade in/out with an environmental noise. Do the same task with the next masking noise.
 +
 
 +
* Final comparison
 +
** For best 3 masking noises, mix in the typical conference noise (speech, paper shuffle, chair noise, typing sounds, and intruding noise) and ask the subjects which one sounds more "inviting."
 +
=== Aparatus To DO list ===
 +
 
 +
*All randomized total 20 minutes
 +
#Subject walks in with ambient noise
 +
#List over speaker of approx 15 words (20sec)
 +
#linear fade of masker during 15 word recital
 +
#beep/flash to start mental math as long as possible (or 2.5 min) 3maskers
 +
#flash to start recital and repeat as much of the list in microphone for as long as they need
 +
#Subject chooses when to start next phase
 +
#quick fade out of  masker to next masker while new 15 words played through speaker.
 +
 
 +
*Data type
 +
            Solutions, time between answers, # of recall word list
 +
 
 +
== FM Masking Noises ==
 +
 
 +
Variables
 +
*modulation width (critical band or speech sounds)
 +
*modulation rate (0.01 - 0.1 fc)
 +
*sinusoidal or stochastic modulation
 +
 
 +
Already fixed
 +
*with broadband noise (what shape, and how loud? - according to the speech)
 +
*band width of the noise (critical band)
 +
*amplitude of each channel (speech sounds spectral distribution)
 +
*number and frequency of center frequencies (3)
== Conference Call Meetings ==
== Conference Call Meetings ==
Line 132: Line 242:
Tuesday 9:30AM  '''Japan''' - Monday 5:30PM '''Stanford'''
Tuesday 9:30AM  '''Japan''' - Monday 5:30PM '''Stanford'''
-
== Parameters for the Noise Generation ==
+
=== August 28, 2006 ===
 +
Tuesday 9:30AM  '''Japan''' - Monday 5:30PM '''Stanford'''
-
Variables
+
=== September 04, 2006 ===
-
*modulation width (critical band or speech sounds)
+
Tuesday 9:00AM '''Japan''' - Monday 5:00PM '''Stanford'''
-
*modulation rate (0.01 - 0.1 fc)
+
-
*sinusoidal or stochastic modulation
+
-
 
+
-
Already fixed
+
-
*with broadband noise (what shape, and how loud? - according to the speech)
+
-
*band width of the noise (critical band)
+
-
*amplitude of each channel (speech sounds spectral distribution)
+
-
*number and frequency of center frequencies (3)
+
-
 
+
-
 
+
-
--[[User:Hiroko|Hiroko]] 18:27, 31 July 2006 (PDT)
+
-
 
+
-
== Experiment design ==
+
-
 
+
-
what experiment to do
+
-
* efficiency test
+
-
** Stimuli: speech is mixed at randomized places in a stream of masking noise
+
-
** Task: "hit the space key when you hear a speech"
+
-
** Speech: 5 numbers (one, two, three, four, eight) spoken by a male and a female of different accents. Numbers were chosen so that they cover five vowels.  
+
-
** Masker: Genetic algorithm approach with human response. We vary one parameter first and then find one or two "sweet spots." Fix the parameter to those found values and vary the next parameter. Choose the best two - repeat this process.
+
-
** Analysis: Response rate (response rate is low when speech is masked, we expect.) Response time distribution (more response time when speech is better masked, we expect.) Both analyses can be done within-subject and across-subject. We can also observe what kind of speech is better masked with a particular masking noise.
+
-
 
+
-
* intelligibility test
+
-
** Speech sounds: Idiomatic phrases and isolated words (Each masking sound has a phrase and word) - TBD
+
-
** 4 sec per stimuli (15 stimuli/min)
+
-
** Measure audibility and intelligibility thresholds
+
-
** Better masking noise / parameter region are chosen from the efficiency test. 
+
-
** The answers the subject choose from are: "I don't hear speech" "Speech is audible" "Speech is intelligible"
+
-
** complete random order (beyond the group of (1) sentense/words (2) masking noise types and (3) playback level)
+
-
** we prohibit a sequencial presentation of a same stimuli from intelligible to less intelligible. (Same masking noise being loud to quiet)
+
-
** analysis - we do not check the correctness - we only measure the intelligible impression
+
-
 
+
-
* annoyance test
+
-
** ten kinds of masking noise, silence, white noise with intruding noise, presented from 4 loudspeakers around.
+
-
** Each one goes on for 30 seconds (or any length) fading in and out for 5 seconds.
+
-
** Fade in the masking noise. Start with the word list, mental math, beep and repeat the word list. Fade out and fade in some enviromental noise (office, traffic, college cafeteria etc.), then next masking noise.
+
-
** Word list is presented to the subject from a loudspeaker in front at 60 dBA. 
+
-
 
+
-
* Final comparison
+
-
** For best 3 masking noises, mix in the typical conference noise (speech, paper shuffle, chair noise, typing sounds, and intruding noise) and ask the subjects which one sounds more "inviting."
+
-
 
+
-
== Atsuko's visit Agenda ==
+
-
* Friday August 4th,
+
-
*: 1pm Meeting (listening room)
+
-
*: 5:30pm - Conference Call Japan
+
-
* Saturday August 5th
+
-
*: Noise, narrowing parameters.
+
-
* Sunday August 6th
+
-
*: Meetings with Jonathan Berger and Hiroko
+
-
* MondayAugust 7th
+
-
*: Pscychoacoustic generic tests (Hiroko)
+
-
*: Brain storm spatializtion parts - experiments strategies
+
-
*: Meeting, Jonathan Berger, Jason, Juan pablo, Atsuko and Hiroko.
+
-
* Wednesday 9th,
+
-
*: 1pm - Meeting
+
-
*: 5:30pm - Conference Call Japan
+
== Links ==
== Links ==
Line 198: Line 253:
*[http://ccrma.stanford.edu/~hiroko/yamaha/ Mass project - support materials by Hiroko], with pictures, sounds and PDF documents on psychoacoustic experiment.  
*[http://ccrma.stanford.edu/~hiroko/yamaha/ Mass project - support materials by Hiroko], with pictures, sounds and PDF documents on psychoacoustic experiment.  
*[http://ccrma.stanford.edu/~jcaceres/yamaha/documentation/expy_cpp/html/inherits.html Experiment C++ Source Code Documentation]
*[http://ccrma.stanford.edu/~jcaceres/yamaha/documentation/expy_cpp/html/inherits.html Experiment C++ Source Code Documentation]
 +
 +
 +
[[Category:Projects]]

Current revision as of 10:03, 2 October 2007

Welcome to the Masking Ambient Speech Sounds project Wiki.

Contents

Project Summary

  1. Recording and diffusion methodologies - testing and implementation
    • Comparison between PZM (4channel) and Sound Field (4 directional)
    • Decision taken for the Sound Filed, because it gives a better space image.
    • Diffusion in a semi-anechoic room, using a 4channel setup.
  2. Processing from offices recordings
    • Tokyo office recording of all the sounds necessary for Calibration and for Impulse response generation.
    • Processing of impulses responses inside and outside the room
    • Pink noise calibration recording
    • Room calibration through a generalized equalization methodology, using omni microphone recordings in the real room and in the simulated room as comparison.
  3. FM masker generation
    • Exploration of different strategies to follow (sinusoidal versus random modulation)
    • Critical band (ERB) width of noise bands.
    • Decision taken for using 3 bands, with random walk modulation (way less annoying than sinusoidal modulation).
  4. Experiment design and implementation
    • Experiment 01
      Beta experiment to test system setup (C++ implementation) for real time experiment with automatic data retrieval. Also, fine tune of the experiment psychoacoustic design.
    • Experiment 02
      Masker Refinement through a general purpose process in which the best candidates are being selected while the worst are being discarded. This is achieved by varying one parameter at the time, and then moving to the next stage with the best candidate for that parameter, and moving another parameter. For the FM masker, the parameters where Center Band Frequency (3 bands), Band Amplitude, Modulation Rate (for each band), and Amplitude of the Modulation (for each band).
    • Experiment 03
      Efficiency, uses Santa Barbara corpus of conversation, in which for 1 masker (that is on throughout the whole experiment) random parts of the conversation are presented and asked if they are either heard or not. The RMS of the random part is recorded, as well as the answer of the subject.
    • Experiment 04
      Annoyance, in design process
  5. Spatialization study
    Study of special variables in the direccionality of the masker. Generation of a “virtual impulse response” in which the sound (masker) comes from outside the room (where the intruding sound is located) but the filtering effect of the wall is removed.

How to setup and calibrate Tascam 3200 mixer

    • Detailed instructions
  • 1)Start hdspmixer & hdspconf(all settings automatic)
  • 2)Type in terminal cd /usr/bin/ then cpufreq-selector –g performance (sets maxcpu)
  • 3)Open jack
         a.Set Frames/Period to 1024
         b.Set Sample Rate to 44100
         c.Set Interface to RME Hammerfall

Mixer config :

  • 4)Equalizing levels and linking channels
        a.Under “SCREEN MODE/NUMERIC ENTRY” click “METER.FADER”
            i.Under tab “CH FADER” set gain levels “CH 1-18” equal
            ii.Under tab “Master M/F” set bus levels “BUSS 1-16” equal 
        b.Under “SCREEN MODE/NUMERIC ENTRY” click “ALT-LINK/GRP”
            i.Click “SEL” for channel 1 followed by 2, 3, and 4
            ii.Double click tab “GROUP ON/OFF”
            iii.Click down curser to set the next grouping
            iv.Click “SEL” for channel 5 followed by 6,7, and 8 
  • 5)Setting the speakers for surround sound
        a.Click “SEL” for channel 1
            i.Under “OUTPUT ASSIGN” select “1”
            ii.Make sure “STEREO” and “DIRECT” are unchecked 
        b.Click “SEL” for Channel 2
            i.Under “OUTPUT ASSIGN” select “3”
            ii.Make sure “STEREO” and “DIRECT” are unchecked 
        c.Repeat this process for the following combinations
            i.Ch1:1, CH2:3, CH3:5, CH4:7, CH5:13, CH6:14, CH7:15, CH8:16
            ii.Channels 1-4 are head-level and Channel 5-8 are above
  • 6)Set up I/0 (if is already screwed up)
        a.Click “ALT_ROUTING” and click “INPUT”
            i.Set CH1 to adat-1, CH2 to adat-2, etc…
            ii.If you want to set up a record line, do so setting CH9 to M/L 9
                  1.Set the top knob and switch to appropriate setting 
                  2.Use CH9 fader to set input level to application
        b.Click “ALT-ROUTING” and click “OUTPUT SLOT” for output cards
            i.Slot A set Trk1-8 to BUSS 1-8 in sequential order (Horizontal)
            ii.Slot B set Trk1-8 to BUSS 9-16 in sequential order (Vertical)

Software Config:

  • 7)Setting up the software with hardware
        a.Go to Application under Bash shell and type “m”, then “make”, then “go”
        b.Play Voice recording and set levels to 25dbA at center
        c.Play Masker noise and set levels to 45dBa at center 
  • 8)Go to “MAIN DIALOG” in software app to set ID & output dir then Repeat 6a


Experiment 01 - Beta Test

GUI Experiment 1

The first listening tests will involve project staff members to check if things make sense. If it looks good we'll start working with non-project volunteers. Experiment 1, in the the CCRMA "Pit," will take about 30 mins. and involve 30 trials. There will be 6 conditions of masking sound crossed with 5 conditions of speech sounds. The masker (FM noise) and the speech sounds will be presented as if the sources are outside the room. We'll use the measured room model from Tokyo and the exterior sound source position (hallway). The "as if" impression will be created by convolving with the measured impulse responses.

Strategies to define conditions for FM masing noise

To define the conditions of this first experiment, the approach will be to leave all the parameters fixed, except the modulation frequency.

Noise set Contains a complete technical documentation of the masking noise generation. It also contains the soundfiles.

The conditions of the masking FM noise will be defined by the following criteria:

  • 3 bands of FM noise will be used (centered at 200 350 and 500 Hz):
    This bands are selected based on an analysis of speech voice recorded in the Tokyo office. The motivation behind this decision is to identify the relevant parameters in the leaking voice. For example, we know that the wall is filtering much of the high frequency components, so that's relevant in the selection of the main frequencies.
  • The amplitude (volume) of each band will be fixed:
    The amplitude was tuned in order to psychoacoustically balance the level of the three noise bands that will be used. This balance was done without modulation.
  • The amplitude of the modulation will be proportional to the modulation frequency:
    The motivation behind this choice is to minimize the annoyance effect. When the modulation rate is low, higher amplitudes are more noticed and annoying.
  • The relation between of modulation frequency of the 3 bands is then the main factor to define the conditions:
    For this experiment, 3 modulation rates are selected, 2, 5 and 7 Hz. The idea is to span some of the frequencies in the range of 2 to 7 Hz. Basically, all the combination of these 3 rates are used for each center frequency, plus a case with no modulation at all.

Findings on the Beta Test

  1. There is a low frequency of the voice that now is not beeing masked.
  2. We need to use a really long conversation, that does never repeat during the experiment.
  3. This corpus of conversations need to have "stationary" properties.

Experiment 02 - Masker Refinement

Experiment design

  • efficiency test
    • Stimuli: speech is mixed at randomized places in a stream of masking noise
    • Task: "hit the space key when you hear a speech"
    • Speech: 5 numbers (one, two, three, four, eight) spoken by a male and a female of different accents. Numbers were chosen so that they cover five vowels.
    • Masker: Genetic algorithm approach with human response. We vary one parameter first and then find one or two "sweet spots." Fix the parameter to those found values and vary the next parameter. Choose the best two - repeat this process.
    • Analysis: Response rate (response rate is low when speech is masked, we expect.) Response time distribution (more response time when speech is better masked, we expect.) Both analyses can be done within-subject and across-subject. We can also observe what kind of speech is better masked with a particular masking noise.

Implementation

GUI Experiment 2
  • Verbal Instructions:

This is a test where there are only 2 buttons are required, spacebar and (enter)return. You are going to have 2 test runs where you are going to be presented with speech. When you hear any speech you are to press spacebar immediately after to signal us that you heared speech. At the end of each cycle a purple bar will light up to let you know the cycle is ready. You will then press (enter)return to begin the next cycle. The first 2 trials are to get you used to pushing the buttons in response to speech, data will be recorded at the beginning of the third trial testing if you heared speech.

  • Speech used in the experiment where voices by Jason and Hiroko with the intent of neutral stress on vowels. The words chosen were one, two, three, four, eight which were convolved with impulse response from the tokyo conference room combined with recorded room noise.

Post Experiment Subject Interviews

  • Phase01:

This test had the most diversity in types of sounds. Since some maskers were not effiient, subjects learned about rhythem of speech presented. Subjects clearly described how some sounds worked better in masking then others since they had an idea of how many sounds were coming at what rate for each masker. Subjects enjoyed this test because differences in maskers were clear.

  • Phase02:

Out of the bunch of 27 maskers we picked 2 candidates for our "golden masker." For this test we changed the amplitude of different center frequencies for these 2 maskers which gave very different sounds throughout the test. Some subjects found that sounds were noticably much harsher and annoying to listen to then others. Several subjects defined that for one masker, it worked really well in masking and sounded like being on an airplane. Subjects still enjoyed this test because differences in maskers were clear.

  • Phase03

At this point we chose 1 masker and used different frequencies of modulation. Most subjects described the sound as droning meaning that it entranced or hypnotized them. This had an effect on most subjects who described the latter half of the test more difficult for them to concentrate. Some subjects claim to almost fall asleep making it difficult to give consistent answers. As I administered the test, I even noticed the sleepy feeling every single time so I started leaving the room during the test. Subjects said that they could hear the female voice very clearly when they would click spacebar (although they would miss more female speech overall). For the male voice that would come through, they would listen for the deep male voice that sounded like short spurts of "wha" and "woo." For the most part, subjects were hitting spacebar when there was speech and not hitting spacebar when they did not hear it as expected from the subjects that I did observe.

  • Phase04

Most subjects described the sound as droning meaning that it entranced or hypnotized them as well. This made sense since we kept the same basic sounds but would change the frequency modulation amplitude. The main difference in this test as I would observe subjects is that they would push spacebar repeatedly when there would be no sound presented. This seem to be due to the fact that the 4 speakers above playing the masking sound is uncorrelated and getting random interference patterns. I assume that the sounds that were generated have a interference pattern that was comparable to the speech used ultimately confusing the listener. This effect played a role on all subjects that I observed and I let them continue pushing spacebar throughout the test. Some felt test was too long because they were falling asleep.

Experiment 03 - Efficiency

Implementation

I've programmed up experiment 3. This uses the Santa Barbara corpus clips in a design that produces a percentage measure of masker effectiveness. It's for one masker (the best one arrived at from experiment 2) at a fixed playback level.

Jason has convolved the first SB dialog file, so it plays from the "hallway."

The subject hears a 2 second clip which the app selects randomly from the convolved file. As it's playing the app records the maximum RMS of the first channel of the clip. The subject responds with "yes" or "no" buttons according to whether they heard voices. The app records the response and the maximum RMS played, and then loops, playing the next randomly chosen 2 second clip.

This iterates a whole bunch of times over 5 minutes producing easily 50 trials per subject. The analysis plots the percentage of yes response vs. RMS. We should see a threshold RMS below which the clips were effectively masked.

For the final "efficiency rating" we go back into the convolved dialog file and calculate the percentage of time the signal is below the threshold.

The dialog chosen and start times are as follows:

  • Santa Barbara Corpus Clips Used

Each clip is 5 minutes long with the start time indicated below. The trackes were normalized then tuned to the appropriate dbFS level in relation to each other to be in the acceptable threshold level for experimentation.

  • TRACK /Start Time /dbFS
  1. sbc0001 /0:23 /-22.1
  2. sbc0002 /0:00 /-9.1
  3. sbc0008 /0:34 /-4.8
  4. sbc0011 /0:14 /-2.3
  5. sbc015 /0:00 /-1.9
  6. sbc020 /0:00 /-4.4
  7. sbc024 /0:00 /-4.1
  8. sbc025 /0:00 /-3.5
  9. sbc027 /0:00 /-7.0
  10. sbc029 /0:00 /-5.7
  11. sbc048 /1:15 /-0.8
  12. sbc050 /2:17 /-6.5

Experiment 04 - Annoyance

Experiment design

  • annoyance test
    • ten kinds of masking noise, silence, white noise with intruding noise, presented from 4 loudspeakers around.
    • Each one goes on for 30 seconds (or any length) fading in and out for 5 seconds.
    • Fade in the masking noise. Start with the word list, mental math, beep and repeat the word list. Fade out and fade in some enviromental noise (office, traffic, college cafeteria etc.), then next masking noise.
    • Word list is presented to the subject from a loudspeaker in front at 60 dBA.
    • Task: a word list is presented at the start. A subject does mental math for 30 seconds (6-10 questions.) After the beep, the subject has to recall the word list presented at the start. Masking noise switches with fade in/out with an environmental noise. Do the same task with the next masking noise.
  • Final comparison
    • For best 3 masking noises, mix in the typical conference noise (speech, paper shuffle, chair noise, typing sounds, and intruding noise) and ask the subjects which one sounds more "inviting."

Aparatus To DO list

  • All randomized total 20 minutes
  1. Subject walks in with ambient noise
  2. List over speaker of approx 15 words (20sec)
  3. linear fade of masker during 15 word recital
  4. beep/flash to start mental math as long as possible (or 2.5 min) 3maskers
  5. flash to start recital and repeat as much of the list in microphone for as long as they need
  6. Subject chooses when to start next phase
  7. quick fade out of masker to next masker while new 15 words played through speaker.
  • Data type
            Solutions, time between answers, # of recall word list

FM Masking Noises

Variables

  • modulation width (critical band or speech sounds)
  • modulation rate (0.01 - 0.1 fc)
  • sinusoidal or stochastic modulation

Already fixed

  • with broadband noise (what shape, and how loud? - according to the speech)
  • band width of the noise (critical band)
  • amplitude of each channel (speech sounds spectral distribution)
  • number and frequency of center frequencies (3)

Conference Call Meetings

July 18, 2006

  • FM Modulation discussion (Yasushi's Comments, with Juan-Pablo's comment on answer A:):
  1. Do you have any idea how to specify frequency modulation for each frequency band?
    • A: based on speech freq, ~2-8 Hz
  2. The period in time for each frequency should be the same?
    • A: No, different. When it's the same the masking efficiency decreases. It seems also more anoying.
  3. Modulation speed will be getting faster according to higher frequency, or
    • A: I don't know yet, this is going to be the main parameter in the first experiment I think.
  4. The frequency modulation considering the voice sound
  5. We have to analyze how the voice sound is modulated in different frequency bands?
    • A: I thiks this is the best way, and we have to consider that the wall is filtering almost all the high frequencies.
  • Discussion of the experiment setup.
  • Look at the documentation, the new example of impulse responses, and delay of arrival.

July 24, 2006

Tuesday 9:30AM Japan - Monday 5:30PM Stanford

  • Discuss Experiment 1.
  • Ask Atsuko about calibration files and SPL meeter.
  • Comment diffusion in the Pit with PZM system (Hiroko).
  • Discuss Experiment Design writen by Hiroko and Atsuko.

July 31, 2006

Tuesday 9:30AM Japan - Monday 5:30PM Stanford

  • Discuss Experiment Design writen by Hiroko and Atsuko.
  • Explain experiment setup.
  • Discuss Atsuko's agenda at CCRMA.
  • Goals for this week are to finsih the setup (C++ and pit room) and collect and analyse some data in a couple of subjects.

August 21, 2006

Tuesday 9:30AM Japan - Monday 5:30PM Stanford

August 28, 2006

Tuesday 9:30AM Japan - Monday 5:30PM Stanford

September 04, 2006

Tuesday 9:00AM Japan - Monday 5:00PM Stanford

Links

Personal tools