Another way to cope with the degeneracy of having just one microphone is to project the signal onto a higher-dimensional subspace before normal analysis. Independent Subspace Analysis was first introduced by Hyvärinen [32]. Casey and Westner extended the subsequent works to circumvent the problem of having only one recording channel in conventional ICA [30]. Basically, the signal is projected onto mutually independent subspaces. Each source, corresponding to each energy track, however, can be spanned by more than one of these subspaces i.e. relaxing the assumption of independence. The problem of having just one channel is removed by taking the STFT magnitude and treating each frequency bin as different channel recordings. Any conventional ICA can then be applied to extract independent outputs corresponding to the magnitude (or energy) tracks in time which are most independent. In [39], Smaragdis took the idea further and investigated the ICA technique exploiting the mutual information criterion in particular. He showed that basic acoustic cues such as harmonicity, common AM/FM modulation and frequency proximity yield higher mutual information and hence would tend to be grouped together as a source by the algorithm. The aim of the experiment is to unify the framework of perceptual grouping under mutual information hypothesis of what the brain does. It has been applied successfully with a drum track, showing good separation of the kick drum, the snare and the hi-hat. In a complex mixture like a song in general, the result is not so good showing unclean energy separation especially in the region of frequency where more than one source components overlap. Also, even though a singing voice is perceived as one source, the algorithm cannot gaurantee the same auditory object in the same source. In other words, the singing voice can be separated into two channels at the output. The remedy is to used either segmentation into appropriate intervals e.g. single-note part from chords, before analysis, or using some measures of similarity across time-frames or output channels to group the same source streams together, as proposed in the original papers.
[19] the so-called refiltering technique is used to separate streams of sources, assuming they are disjoint in time. The algorithm is a hybrid of CASA and a form of statistical learning, though not ICA. A speaker dependent HMM is fit to the training data which is then used to determine the binary masking function for each time sample. The states in the HMM are derived from the STFT pairs of coefficients. The separation is succesful when applied to the mixture of training samples and is training/speaker dependent. The time disjoint assumption is also invalid in many instances. This is one of the hybrid ICA-CASA system for source separation. Another is done by Cichocki in [31]. Masking is also one of the most common schemes used in source separation. It success obviously depends on the masking function and how the signal representation allocate the source energy among the coefficients.