Regaip Sen - Documentation for MUS319 Phase Perception Research

Introduction

The purpose of this research is to compare Roy Patterson's gammatone filter-bank model of pitch perception with the Meddis-Lyon cochlear model that combines autocorrelation with automatic gain control. Malcolm Slaney has coded each model with MATLAB so that given a signal input, the codes output a representation of hair cell responses.

Roy Patterson's Experiments

(see Appendix B for results of interest) Patterson's three experiments cited in his 1987 article from MRC Applied Psychology compared auditory perception of waves with modified-phase harmonics to waves with zero-phase harmonics. The control stimuli were called Constant-Phase, or CPH. Each set of experiments was conducted using fundamental frequencies of 62.5 Hz, 125 Hz, 250 Hz, and 500 Hz. The three experiments determined if humans can distinguish CPH from the following phase-modified stimuli:

Also, each stimulus was presented with four or eight harmonics, and each version is further divided into four categories in which the set begins with the first, fourth, eighth, or sixteenth harmonic.

Robert Carlyon and Shihab Shamma's Model

In JASA's July 2003 issue, Carlyon and Shamma argued that auditory models should not discard across-channel phase information. In defense of this argument, they presented a model that accounts for this information and analyzed its response to stimuli from past phase-perception experiments. The experiments and their results of interest are as follows:

  1. Carlyon and Shackleton (1994): Listeners were to distinguish phase changes between groups of unresolved harmonics of 88 Hz. The harmonics are band-pass filtered into three groups of the following ranges: 125-625 Hz, 1375-1875 Hz, and 3900-5400 Hz. Phase detection between the middle and high-frequency groups occurred when they were 1.4 ms out of phase. This corresponds to a 7pi phase difference.
  2. Craig and Jeffress (1962): Listeners were to distinguish a 180 phase change between a 250 Hz tone superimposed with a 500 Hz tone. The 250 Hz tone was presented at 40, 50, 60, 70, 80, and 84 dB SPL, and the 500 Hz tone was presented at 3, 13, 23, 43, 53, 63, and 73 dB SPL. The 500 Hz tone was also presented at a 90-degree phase shift from the 250 Hz tone for each of these settings, and at 45 and 135-degree phase shifts for cases in which the 250 Hz tone was at 60 dB SPL. Detection occured when the 250 Hz tone was at 50 dB SPL or higher. 180 degrees for a 250 Hz signal corresponds to a 2 ms delay.
  3. J. L. Goldstein (1966): Listeners were to distinguish between AM and Quasi-Frequency Modulated (QFM) tones, where the latter tones are AM tones with their center frequency phase-shifted by 90 degrees. The AM and QFM tones were presented in 1-second pulses ad separated by a half-second. Carrier frequencies ranged from 250 Hz to 16 kHz, magnitude between 20 and 0 dB SL, and a range of modulation frequencies dependent on the carrier. This study was done to elaborate on a 1947 study by Mathes and Miller that had used a smaller range of stimuli. Goldstein's study showed that at a carrier frequency of 1 kHz, listeners could distinguish tones with a modulation of up to 330 Hz.
  4. Yost and Sheft (1989): In the second of two experiments presented in this paper, listeners were to distinguish phase changes in an amplitude-modulated probe in the presence of a masking tone. Both tones were played simultaneously for one second, and while the mask was amplitude-modulated in its entirety, the probe was amplitude-modulated only for the middle 500 ms. Listeners were presented with a set of two stimuli in which the modulations were in phase for one stimulus, and out of phase for the other. When the mask carrier frequency was 1 kHz and the probe carrier frequency was 4 kHz, the detection threshold corresponded to a 60 phase difference.
  5. J. Patterson and Green (1970): Students were to distinguish the order of sets of click-like Huffman sequences. Since by definition these sequences differ only in phase, a correct ordering corresponds to phase detection. The listeners could distinguish sequences as brief as 2.5 ms.

Shihab and Shamma analyzed the output of their model via spectrogram using a linear approximation to the action of higher central auditory stages. While they admit to the limitation of a spectrogram that does not vary with input level (they suggested resolving this problem with cochlear filter banks), graphs showed asynchronous stimuli to generate greater excitement among the auditory stages than synchronous stimuli. The exception was the Craig and Jeffress stimuli: their model responded to both synchronous and asynchronous stimuli. They attributed this to the fact that since the higher tone was the first harmonic of the other, their synchronous output generated large phase transitions at peaks.

In addition to their own model, Shihab and Shamma tested other models with the same stimuli. Meddis's autocorrelation model did not respond to the asynchronies among Carlyon and Shackleton's groups of unresolved harmonics or Yost and Sheft's envelopes of AM tones. Roy Patterson's model gave the same (correct) response for Craig and Jeffress's stimuli.

Aside from the limitation of their spectrogram, Shihab and Shamma noted that the cochlear filters in their model are broader than their physiological counterparts. Their proposed solution is to reintroduce a lateral inhibitory network, which would make the filters more selective than Patterson's gammatone filter.

Contributions

Regaip Sen has created a series of wave files that duplicate the stimuli used in the Patterson and Craig/Jeffress experiments using LISP. See Appendix A for the code. For the Patterson experiments, the APH stimuli were expanded to those with the following phase shifts:

Hiroko Terasawa calculated the model's outputs of the sound stimuli and plotted their correlograms (which proved to be more correct than cochleagrams represent hair cell response). The results of the MATLAB model qualitatively agree with Patterson's experimental results as well as informal listening experiments conducted by the present researchers; however they were quantitatively inconsistent.

We can do several things to improve results. First, we need a better way to integrate the cochlear and hair cell responses. Then we need to tune the model bandwidths, which will solve the quantitative inconsistency.First, we need to retest Patterson's model with all the test cases.

References

Carlyon, R. P., Shackleton, T. M., (1994). "Detecting pitch-pulse asynchronities and differences in fundamental frequency," J. Acoust. Soc. Am. 95, 968-979.

Carlyon, R. P., and Shamma. S., (2003). "An account of monaural phase perception." J. Acoust. Soc. Am. 113, 333-348.

Craig, J. H., and Jeffress, L. A., (1962). "Effect of phase on the quality of a two-component tone," J. Acoust. Soc. Am. 34, 1752-1760.

Goldstein, J. L. (1966)., "Auditory spectral filtering and monaural phase perception," J. Acoust. Soc. Am. 42, 458-479.

Moore, Brian. An Introduction to the Psychology of Hearing, 4th Ed. San Diego: Academic Press. (1997).

Patterson, J. H., and Green, D. M., (1970). "Discrimination of transient signals having identical energy spectra," J. Acoust. Soc. Am. 48, 894-905.

Patterson, R. D., (1987). "A pulse ribbon model of monaural phase perception." J. Acoust. Soc. Am. 82, 1560-1586.

Yost, W. A., and Sheft, S., (1989). "Across-critical-band processing of amplitude-modulated tones," J. Acoust. Soc. Am. 85, 848-857.


Appendix A: LISP code for test cases



;;; USES CM-CLM-CMN LISP LIBRARIES
;;; FOR definstrument AND make-score

;;; To create samples from the CCRMA network,
;;;  save the following code as "make-samples.lisp"
;;;  then run the following lines from the terminal:

;;; /usr/bin/clisp-cm-clm-cmn
;;; (compile-file "make-samples" :verbose nil)
;;; (load *)
;;; (make-samples-patterson)
;;; (make-samples-craig-jeffress)


(definstrument partial (start dur frequency freqskew amplitude freq-envelope amp-envelope phase)
  (let* ((gls-env (make-env :envelope freq-envelope :scaler (hz->radians freqskew) :duration dur))
	 (os (make-oscil :frequency frequency :initial-phase phase))
	 (amp-env (make-env :envelope amp-envelope :scaler amplitude :duration dur))
	 (len (inexact->exact (round (* *srate* dur))))
	 (beg (inexact->exact (round (* *srate* start))))
	 (end (+ beg len)))
    (run 
     (loop for i from beg to end do
       (outa i (* (env amp-env)
		  (oscil os (env gls-env)))
	     )))))

(defun make-samples-patterson ()
  (let* (
	 (x 0)	 
	 (freq_env '(0 1 1 1))
	 (amp_env '(0 1 1 1))
	 (spec_a (make-array 32 :initial-contents '(0 84 64 52 43 37 33 29 26 23 21 20 18 17 16 15 14 13 12 12 11 10 10 9 9 9 8 8 8 7 7 7)))
	 (spec_b (make-array 32 :initial-contents '(0 334 257 208 174 149 130 116 104 94 86 79 73 67 63 59 55 52 49 46 44 42 40 38 36 35 33 32 31 29 28 27)))
	 (spec_c (make-array 32 :initial-contents '(0 1336 1026 830 695 597 522 463 415 375 343 315 290 269 251 235 220 208 196 185 176 167 159 152 145 139 133 127 122 118 113 109)))
	 (spec_d (make-array 32 :initial-contents '(0 5344 4104 3321 2781 2387 2087 1850 1659 1502 1370 1258 1162 1078 1004 940 882 830 784 741 703 668 636 607 579 554 531 509 489 470 453 436)))
	 
)


;;; EACH (with-sound... STATEMENT CREATES SAMPLES FOR 62.5, 125, 250, AND 500 HZ.
	    
  (loop for hz in '(62.5 125 250 500) do
  (loop for harm in '(4 8) do
  (loop for hh in '(0 3 7 15) do   
;;; CPH, RPH



    (with-sound (:srate 44100 :header-type mus-riff :output (format nil "/usr/ccrma/snd/220a-2003/regosen/319/cph~Ah~A-~Aharm.wav" hz hh harm))
 (setf x (* hz hh))
		(loop for i from hh below (+ hh harm) do
 (setf x (+ x hz))
 (partial 0 0.256 x x .03 freq_env amp_env 0)))


    (with-sound (:srate 44100 :header-type mus-riff :output (format nil "/usr/ccrma/snd/220a-2003/regosen/319/rph~Ah~A-~Aharm.wav" hz hh harm))
 (setf x (* hz hh))
		(loop for i from hh below (+ hh harm) do
 (setf x (+ x hz))
 (setf p (* pi (/ (random 360) 180)))
 (partial 0 0.256 x x .03 freq_env amp_env p)))


;;; APH (5-90 DEGREES, FROM 0/4/8/16th HARMONICS)

(loop for ph from 1 below 21 do

    (with-sound (:srate 44100 :header-type mus-riff :output (format nil "/usr/ccrma/snd/220a-2003/regosen/319/aph~Ap~Ah~A-~Aharm.wav" hz ph hh harm))
 (setf x (* hz hh))
		(loop for i from hh below (+ hh harm) do
 (setf x (+ x hz))
 (setf p (if (= (/ i 2) (floor (/ i 2))) (/ pi (/ 360 ph)) (- (* 2 pi) (/ pi (/ 360 ph)))))
 (partial 0 0.256 x x .03 freq_env amp_env p))))


(loop for ph from 65 below 81 do

    (with-sound (:srate 44100 :header-type mus-riff :output (format nil "/usr/ccrma/snd/220a-2003/regosen/319/aph~Ap~Ah~A-~Aharm.wav" hz ph hh harm))
 (setf x (* hz hh))
		(loop for i from hh below (+ hh harm) do
 (setf x (+ x hz))
 (setf p (if (= (/ i 2) (floor (/ i 2))) (/ pi (/ 360 ph)) (- (* 2 pi) (/ pi (/ 360 ph)))))
 (partial 0 0.256 x x .03 freq_env amp_env p))))

(loop for ph from 5 below 185 by 5 do

    (with-sound (:srate 44100 :header-type mus-riff :output (format nil "/usr/ccrma/snd/220a-2003/regosen/319/aph~Ap~Ah~A-~Aharm.wav" hz ph hh harm))
 (setf x (* hz hh))
		(loop for i from hh below (+ hh harm) do
 (setf x (+ x hz))
 (setf p (if (= (/ i 2) (floor (/ i 2))) (/ pi (/ 360 ph)) (- (* 2 pi) (/ pi (/ 360 ph)))))
 (partial 0 0.256 x x .03 freq_env amp_env p))))



;;; MPH (1/8, 1/2, 2, and 8-SCALAR)


    (with-sound (:srate 44100 :header-type mus-riff :output (format nil "/usr/ccrma/snd/220a-2003/regosen/319/mph~A_1_8_h~A-~Aharm.wav" hz hh harm))
 (setf x (* hz hh))
 (setf p 0)
		(loop for i from hh below (+ hh harm) do
 (setf x (+ x hz))
 (setf p (+ p (aref spec_a i)))
 (partial 0 0.256 x x .03 freq_env amp_env p)))
    (with-sound (:srate 44100 :header-type mus-riff :output (format nil "/usr/ccrma/snd/220a-2003/regosen/319/mph~A_1_2_h~A-~Aharm.wav" hz hh harm))
 (setf x (* hz hh))
 (setf p 0)
		(loop for i from hh below (+ hh harm) do
 (setf x (+ x hz))
 (setf p (+ p (aref spec_b i)))
 (partial 0 0.256 x x .03 freq_env amp_env p)))
        (with-sound (:srate 44100 :header-type mus-riff :output (format nil "/usr/ccrma/snd/220a-2003/regosen/319/mph~A_2_h~A-~Aharm.wav" hz hh harm))
 (setf x (* hz hh))
 (setf p 0)
		(loop for i from hh below (+ hh harm) do
 (setf x (+ x hz))
 (setf p (+ p (aref spec_c i)))
 (partial 0 0.256 x x .03 freq_env amp_env p)))
	    (with-sound (:srate 44100 :header-type mus-riff :output (format nil "/usr/ccrma/snd/220a-2003/regosen/319/mph~A_8_h~A-~Aharm.wav" hz hh harm))
 (setf x (* hz hh))
 (setf p 0)
		(loop for i from hh below (+ hh harm) do
 (setf x (+ x hz))
 (setf p (+ p (aref spec_d i)))
 (partial 0 0.256 x x .03 freq_env amp_env p)))

)))))



(defun make-samples-craig-jeffress ()
  (let* ( 
	 (freq_env '(0 1 1 1))
	 (amp_env '(0 1 1 1))	 
)



 (loop for db from 3 below 74 by 10 do

 (loop for ph in '(0 90) do
 (loop for dbA in '(40 50 60 70 80 84) do
    (with-sound (:srate 44100 :header-type mus-riff :output (format nil "/usr/ccrma/snd/220a-2003/regosen/319/exp1/craig~Aa~Ab~A.wav" ph dbA db))
 (partial 0 1 250 250 (* .8 (expt 10 (/ -20 dbA))) freq_env amp_env 0)
 (partial 0 1 500 500 (* .8 (expt 10 (/ -20 db))) freq_env amp_env ph))

    (with-sound (:srate 44100 :header-type mus-riff :output (format nil "/usr/ccrma/snd/220a-2003/regosen/319/exp1/craig~Aa~Aib~A.wav" ph dbA db))
 (partial 0 1 250 250 (* -.8 (expt 10 (/ -20 dbA))) freq_env amp_env 0)
 (partial 0 1 500 500 (* -.8 (expt 10 (/ -20 db))) freq_env amp_env ph))))

 (loop for ph in '(45 135) do
    (with-sound (:srate 44100 :header-type mus-riff :output (format nil "/usr/ccrma/snd/220a-2003/regosen/319/exp1/craig~Aa60b~A.wav" ph db))
 (partial 0 1 250 250 (* .8 (expt 10 (/ -20 60))) freq_env amp_env 0)
 (partial 0 1 500 500 (* .8 (expt 10 (/ -20 db))) freq_env amp_env ph))

    (with-sound (:srate 44100 :header-type mus-riff :output (format nil "/usr/ccrma/snd/220a-2003/regosen/319/exp1/craig~Aa60ib~A.wav" ph db))
 (partial 0 1 250 250 (* -.8 (expt 10 (/ -20 60))) freq_env amp_env 0)
 (partial 0 1 500 500 (* -.8 (expt 10 (/ -20 db))) freq_env amp_env ph))))))



Appendix B: Results of Roy Patterson's Experiments taken from his article