My name is Matt Wright and I'm doing my PhD dissertation research on the perception of musical rhythm by people and by computers. Specifically, the core of my research is the study of Perceptual Attack Time, "the time a sound is perceived as a rhythmic event," which is generally after the actual physical beginning of the sound.
As an example, here's the waveform of a short bowed violin note that I'm studying in this research:
In this graph, time zero is the first instant where is any sound at all, i.e., the "physical onset". According to my own personal subjective experience, the Perceptual Attack Time of this note is about 34 milliseconds after its physical onset.
Here are some sound examples that demonstrate how I came up with the number "34 milliseconds":
For me, in number 2 it sounds like the violin player is late compared to the percussive sound (even though they're starting at exactly the same time), while in number 3 it sounds like they're rhythmically together (even though the violin is 34 milliseconds early).
To your ear, maybe number 3 sounds a little bit off and you'd like to move the violin earlier or later to make it line up better with the other sound. That's the "perceptual" in "perceptual attack time": it's a measure of what each person perceives, not something that is inherently "true" about the sound. And that's why I want to ask lots of people to give me their subjective impressions, not just do it for myself.
There are two main reasons:
In short, I want to find the Perceptual Attack Time of a variety of sounds.
For each pair of sounds I will combine the results from all my subjects to make a histogram of the exact time relationship that the two sounds were in when the subject said they sounded "together." A central idea of my research is to treat Perceptual Attack Time not as a single instant, but as a continuous probability density function, so I'll interpret these histograms as probability density functions.
These experiments will prove or disprove the following hypotheses:
Part 1 of my research is these experiments, to gather subject's perceptual judgements on how various pairs of sounds have to be aligned to sound "together." The cumulative responses from all my subjects will become my "ground truth."
Part 2 of my research is to build statistical and signal processing models of perceptual attack time. These will take in a digital audio sample of some arbitrary sound and will try to estimate what a group of listeners would perceive that sound's PAT to be.
Part 3 of my research is to build something useful based on the models from part 2. I've started working on compositional projects (which select and arrange sounds from a large database in various ways depending on the shapes of their estimated PAT distributions), a beat tracking project (in which a computer "foot tapper" takes these PAT distributions as input and outputs an idea as to where "the beat" is), and music analysis project (in which I use these estimated PAT distributions to compare the rhythmic microstructure of various recordings).
I will publish all my data and gladly share it with the public.
You can read the research protocol that the Stanford Human Subjects Institutional Review Board (IRB) approved for this experiment.
You can download the software from the same place subjects download it, if you want to look at the Max/MSP programming.
Please email me: matt (at) ccrma (dot) stanford (dot) edu
Thanks!