Next |
Prev |
Up |
Top
|
JOS Index |
JOS Pubs |
JOS Home |
Search
Harvey Thornburg
harv23 at ccrma
(EE)
Segmentation refers to the detection and estimation of abrupt change
points. Rhythm tracking refers ultimately to the identification of
tempo and meter, but in this context it refers more generally to the
identification of any higher-level ``structure'' or ``pattern'' amongst
change points. Therefore the name ``rhythm tracking'', despite being
catchy, is somewhat over-optimistic as to the goals of this project.
Nevertheless, it serves until a better one can be found.
A naive approach to rhythm tracking is to use the partial output of a
segmentation algorithm (say, the list of change *times*) to feed a
pattern recognition algorithm which learns the structure of this
event stream. What ``learning'' means depends on the how the data is
processed. In the online setting, which is easiest to develop, learning
means to predict, i.e. to develop posterior distributions over future
change times given those already observed.
There are a number of problems with the naive approach. First, it is open
loop. We could, and should somehow ``close the loop'', i.e. use the
tracker's output posterior distributions as priors for the
segmentation. This requires a Bayesian framework for the segmentation
itself, which will be the *specific* focus of this talk. There are really
two motivations for developing Bayesian techniques:
- For the specific problem of rhythm tracking, only a fraction of
detected change times are likely to convey rhythmic information. As
well as note onsets, there could be expressive timbral and/or pitch
fluctuations, interfering signals, and so on, which either do not
correspond to rhythmic information or do so less reliably than the
onset information. Hence, it is desired that the underlying
segmentation method learn to recognize specifically the changes
corresponding to the desired structure, a phenomenon known as
``stream filtering''.
- Independent of rhythm tracking, there is still the concern that
we make efficient use of large data samples. When change times exhibit
a regular pattern (i.e. anything that differs from Poisson) this means
events in a certain location will give information about those
far away. Since any practical (i.e. linear time) approximate change
detection scheme uses only short windows with local information in the
signal about a given change point, it is desired to communicate more
global kinds of information, without resorting to large windows.
Second, structures based only on the change times do not seem to cope
well with rests, subdivisions, and like phenomena. For instance, it
is unlikely that a given list of interarrival times will simply
repeat, so immediately we must cope with the problem of variable
length cycles. Hence, lists of groups of rhythmic intervals is not a
good intermediate representation. We have to be a bit more clever. A
proposed solution, easily adapted to general frameworks for change
detection, is to propagate information about the ``strength'' of each
change as well as the change time. Formally, we define a change point
as the pair Tk,Ek, where Tk is the time of a possible change, and Ek
is an indicator that the change actually occurred. Right now the
``strength'' is defined as the posterior likelihood P(Ek|Tk, Ek-1,
Tk-1, Ek+1, Tk+1, Ek-2, Tk-2, Ek+2, Tk+2, ...). This definition
probably needs refinement. This likelihood may be computed using
familiar methods (i.e. subspace tests). Intuitively, P(Ek|Tk, ...)
relates to information about dynamics, and also encodes the
indications of ``rests'' (P(Ek|Tk, ...) « 1). In this
representation, encoding of temporal information becomes very easy
(the ``metronome'' model), and all the crucial information gets
encoded in the patterns of strengths.
During the talk, I will focus mostly on the Bayesian framework for
*sequential* change detection when the model after change is
unknown. Here the appropriate prior concerns the *location* of the
next change point, as well as the prior probability the change has
occured before time zero (I find one can just set this to zero, but
it's needed to calculate expected cost). Our stopping rule minimizes
an expected cost function involving the event of a false alarm, the
total number of miss-alarms, and the number of samples
observed. Minimizing the latter also minimizes detection delay.
Practically, we must also know something apriori concerning the
distribution of models after change. There seem to be two promising
approaches: the marginalized approach where we average over the entire
space of candidate models (we must have a prior distribution for these
models), and the multimodel approach where we sample the alternative
model space to get some ``representative'' set. The way I do this
using the fewest models possible is to sample at the ``poles'' of
significance shells. I have found multimodel sampling approaches to
work better in practice, but a thorough analysis is still being worked
out.
It seems the ``strength'' prior cannot be integrated here, but we can
use it in subsequent ``backtracking'' using the Bayesian subspace
test. That part needs more refinement and discussion.
Next |
Prev |
Up |
Top
|
JOS Index |
JOS Pubs |
JOS Home |
Search
Download mus423h.pdf