Next  |  Prev  |  Up  |  Top  |  JOS Index  |  JOS Pubs  |  JOS Home  |  Search

The Bayesian Approach to Segmentation and Rhythm Tracking

Harvey Thornburg $<$harv23 at ccrma$>$ (EE)

Segmentation refers to the detection and estimation of abrupt change points. Rhythm tracking refers ultimately to the identification of tempo and meter, but in this context it refers more generally to the identification of any higher-level ``structure'' or ``pattern'' amongst change points. Therefore the name ``rhythm tracking'', despite being catchy, is somewhat over-optimistic as to the goals of this project. Nevertheless, it serves until a better one can be found.

A naive approach to rhythm tracking is to use the partial output of a segmentation algorithm (say, the list of change *times*) to feed a pattern recognition algorithm which learns the structure of this event stream. What ``learning'' means depends on the how the data is processed. In the online setting, which is easiest to develop, learning means to predict, i.e. to develop posterior distributions over future change times given those already observed.

There are a number of problems with the naive approach. First, it is open loop. We could, and should somehow ``close the loop'', i.e. use the tracker's output posterior distributions as priors for the segmentation. This requires a Bayesian framework for the segmentation itself, which will be the *specific* focus of this talk. There are really two motivations for developing Bayesian techniques:

  1. For the specific problem of rhythm tracking, only a fraction of detected change times are likely to convey rhythmic information. As well as note onsets, there could be expressive timbral and/or pitch fluctuations, interfering signals, and so on, which either do not correspond to rhythmic information or do so less reliably than the onset information. Hence, it is desired that the underlying segmentation method learn to recognize specifically the changes corresponding to the desired structure, a phenomenon known as ``stream filtering''.

  2. Independent of rhythm tracking, there is still the concern that we make efficient use of large data samples. When change times exhibit a regular pattern (i.e. anything that differs from Poisson) this means events in a certain location will give information about those far away. Since any practical (i.e. linear time) approximate change detection scheme uses only short windows with local information in the signal about a given change point, it is desired to communicate more global kinds of information, without resorting to large windows.

Second, structures based only on the change times do not seem to cope well with rests, subdivisions, and like phenomena. For instance, it is unlikely that a given list of interarrival times will simply repeat, so immediately we must cope with the problem of variable length cycles. Hence, lists of groups of rhythmic intervals is not a good intermediate representation. We have to be a bit more clever. A proposed solution, easily adapted to general frameworks for change detection, is to propagate information about the ``strength'' of each change as well as the change time. Formally, we define a change point as the pair Tk,Ek, where Tk is the time of a possible change, and Ek is an indicator that the change actually occurred. Right now the ``strength'' is defined as the posterior likelihood P(Ek|Tk, Ek-1, Tk-1, Ek+1, Tk+1, Ek-2, Tk-2, Ek+2, Tk+2, ...). This definition probably needs refinement. This likelihood may be computed using familiar methods (i.e. subspace tests). Intuitively, P(Ek|Tk, ...) relates to information about dynamics, and also encodes the indications of ``rests'' (P(Ek|Tk, ...) « 1). In this representation, encoding of temporal information becomes very easy (the ``metronome'' model), and all the crucial information gets encoded in the patterns of strengths.

During the talk, I will focus mostly on the Bayesian framework for *sequential* change detection when the model after change is unknown. Here the appropriate prior concerns the *location* of the next change point, as well as the prior probability the change has occured before time zero (I find one can just set this to zero, but it's needed to calculate expected cost). Our stopping rule minimizes an expected cost function involving the event of a false alarm, the total number of miss-alarms, and the number of samples observed. Minimizing the latter also minimizes detection delay. Practically, we must also know something apriori concerning the distribution of models after change. There seem to be two promising approaches: the marginalized approach where we average over the entire space of candidate models (we must have a prior distribution for these models), and the multimodel approach where we sample the alternative model space to get some ``representative'' set. The way I do this using the fewest models possible is to sample at the ``poles'' of significance shells. I have found multimodel sampling approaches to work better in practice, but a thorough analysis is still being worked out.

It seems the ``strength'' prior cannot be integrated here, but we can use it in subsequent ``backtracking'' using the Bayesian subspace test. That part needs more refinement and discussion.


Next  |  Prev  |  Up  |  Top  |  JOS Index  |  JOS Pubs  |  JOS Home  |  Search

Download mus423h.pdf

``CCRMA DSP Seminar Prior Abstracts'', by Julius O. Smith III, Aut-Spr Quarters, CCRMA Ballroom, The Knoll, Stanford University.
Copyright © 2005-12-28 by Julius O. Smith III
Center for Computer Research in Music and Acoustics (CCRMA),   Stanford University
CCRMA  [Automatic-links disclaimer]