next up previous
Next: Classification System Up: Instrument Identification using ISA Previous: Instrument Identification using ISA


ISA of a single instrument sound

When ISA is applied to a spectrogram of a tone produced by an instrument, the result is the decomposition into components roughly distinguishable as sustain, the note-attack and/or other small spectral variations as shown in Fig. 1 for a piano tone. Naturally, the most energetic component corresponds to the sustained note's spectrum with a lot to offer for identification. Despite having lesser energy, other components may also be useful. Human has been found to use note-attacks, the breathiness and some spectral dynamics in source identification as well as the spectral envelope of the tone. In addition, it eliminates the problem of how a note-attack should be defined, since it is now automatically determined according to its mutual independence to the sustain portion. Similar decomposition has been found in other instruments used in this experiment.

When there are multiple sources in the mixture, the system hopefully will emit something closely enough to the original basis components of the sources, allowing physiologically intuitive use of such features described above in identification. Such a dramatically successful example is found in a mixture of Oboe and Bb-Clarinet playing note C4 concurrently with Bb-Clarinet lasting about 0.5 second longer. Despite having the same pitch and very similar sound, the spectral bases have been found to be rather well separated and are readily identified by a comparison to their isolated tone's first ISA spectral bases. Using a classifier in section 3, 7 out of 8 components are classified correctly with transient components matching with similar components in the trained prototypes. Admittedly, however, it is still required to be sufficiently non-overlapping, either temporally or spectrally, for such a healthy separation.

Though more than one bases may belong to the spanning set of one source subspace, we have to stop short of grouping them. Clustering of components belonging to the same source is problematic. This is not only because of the difficulty in estimating a reliable similarity measure as used successfully in [7] for a complex mixture, but also by the fact that they simply cannot be used to group transient and steady-state of the same sources together.

Figure 1: ISA magnitude spectral bases (left) and magnitude temporal envelope coefficients (right) of a G4-piano tone.

The advantage of using such a data-driven algorithm lies in its ability to do auditory grouping with no extra rules [7]. It does not rely on pitch estimation which is hard to do in a polyphonic signal. However, the drawbacks include its reliability on an exposure long enough for meaningful components to be learned. The current linear model is also limiting and only approximately true with respect to the use of magnitude. There is also no guarantee that the bases derived from a mixture will be the same as those learned from a single instrument, or even whether they will be separated like the example shown above. From experiments, this happens from time to time and brings down the identification performance. For example, a beating effect of nearby harmonics can cause the algorithm to yield a basis which is unidentifiable with any of the individual sources. In the next section, we will then examine how well the system can do in lights of these potential obstacles.


next up previous
Next: Classification System Up: Instrument Identification using ISA Previous: Instrument Identification using ISA
Pamornpol Jinachitra 2004-02-25