The Research Behind the "Together" Listening Experiments

My name is Matt Wright and I'm doing my PhD dissertation research on the perception of musical rhythm by people and by computers. Specifically, the core of my research is the study of Perceptual Attack Time, "the time a sound is perceived as a rhythmic event," which is generally after the actual physical beginning of the sound.

What is Perceptual Attack Time?

As an example, here's the waveform of a short bowed violin note that I'm studying in this research:

Waveform of short violin note

In this graph, time zero is the first instant where is any sound at all, i.e., the "physical onset". According to my own personal subjective experience, the Perceptual Attack Time of this note is about 34 milliseconds after its physical onset.

Here are some sound examples that demonstrate how I came up with the number "34 milliseconds":

  1. Just the violin note: violin.mp3
  2. The violin note and a percussive sound, starting at exactly the same time: vln+stick-together.mp3
  3. The violin note and a percussive sound, with the violin starting first and then the percussive sound starting 34 milliseconds later: vln_first_then_stick.mp3

For me, in number 2 it sounds like the violin player is late compared to the percussive sound (even though they're starting at exactly the same time), while in number 3 it sounds like they're rhythmically together (even though the violin is 34 milliseconds early).

To your ear, maybe number 3 sounds a little bit off and you'd like to move the violin earlier or later to make it line up better with the other sound. That's the "perceptual" in "perceptual attack time": it's a measure of what each person perceives, not something that is inherently "true" about the sound. And that's why I want to ask lots of people to give me their subjective impressions, not just do it for myself.

Why is Perceptual Attack Time Important?

There are two main reasons:

  1. Suppose you're constructing a melody or some kind of rhythmic pattern on a computer out of a variety of different kinds of sounds. If you align the sounds in time so that their beginnings follow the timing of a certain rhythm, then the result is likely to sound a little bit unsteady or out of rhythm. To make it sound like the rhythm you have in mind, you need to align the sounds in time so that their perceptual attack times follow the timing of your rhythm.
  2. Suppose you want to use a computer to help you gain insight into the fine rhythmic structure of a recording, like for analysis of swing in jazz or rubato in western classical piano playing, or microtiming in general. Although computers are getting pretty good at finding when each note begins, i.e., the physical onsets, what you really want to know is when each note is perceived rhythmically. For example, suppose you have recording of a drummer and a bass player that lock in together rhythmically in an extremely satisfying way, and you're trying to quantify something scientifically about how they play together, so you analyze the recording with a computer. If the bass player's notes tend to start about 10 milliseconds before the drummer's notes the obvious conclusion would that the bass player is playing "ahead of" or "before" the drummer, but if it turns out that the perceptual attack time of the bass notes is about 25 milliseconds while the perceptual attack time of the drum notes is about 3 milliseoncs, than in fact the bass player is "perceptually" playing "behind" or "after" the drummer. Musicians don't learn to make their notes' physical onsets have a certain rhythm; they learn to make their notes' perceptual attack times have a certain rhythm.

What Will These Experiments Find Out?

In short, I want to find the Perceptual Attack Time of a variety of sounds.

For each pair of sounds I will combine the results from all my subjects to make a histogram of the exact time relationship that the two sounds were in when the subject said they sounded "together." A central idea of my research is to treat Perceptual Attack Time not as a single instant, but as a continuous probability density function, so I'll interpret these histograms as probability density functions.

These experiments will prove or disprove the following hypotheses:

  1. Subjects will not exactly replicate their response for repetitions of the same trial, but instead their responses will fit a probability distribution.
  2. The shapes of these probability distributions will vary based on the sharpness of attack and other characteristics of the musical material.
  3. These probability distributions will be narrower (i.e., subjects will repeat the same results more consistently) when the sound of the click better matches the sound of the musical material.
  4. Subjects will be more consistent when the musical material establishes an understandable and predictable rhythmic context.

How Do These Experiments Fit Into The Overall Research?

Part 1 of my research is these experiments, to gather subject's perceptual judgements on how various pairs of sounds have to be aligned to sound "together." The cumulative responses from all my subjects will become my "ground truth."

Part 2 of my research is to build statistical and signal processing models of perceptual attack time. These will take in a digital audio sample of some arbitrary sound and will try to estimate what a group of listeners would perceive that sound's PAT to be.

Part 3 of my research is to build something useful based on the models from part 2. I've started working on compositional projects (which select and arrange sounds from a large database in various ways depending on the shapes of their estimated PAT distributions), a beat tracking project (in which a computer "foot tapper" takes these PAT distributions as input and outputs an idea as to where "the beat" is), and music analysis project (in which I use these estimated PAT distributions to compare the rhythmic microstructure of various recordings).

I will publish all my data and gladly share it with the public.

For More Information

You can read the research protocol that the Stanford Human Subjects Institutional Review Board (IRB) approved for this experiment.

You can download the software from the same place subjects download it, if you want to look at the Max/MSP programming.

Please email me: matt (at) ccrma (dot) stanford (dot) edu