next up previous
Next: Statistical Learning Up: Computational Auditory Scene Analysis Previous: Non-stationary model

Artificial Auditory Neural Network

It should not be too surprising that people turn to human auditory system modeling to solve the problem of source separation. After all, it is what our ears and brain do and we might not expect a computer to do better than us. In general, this involves emulating the human physical and biological hearing system from the well-understood outer ears to mid-level acoustic representation and to the relatively unknown cognition in the auditory cortex. Though this approach is in fact overlapping with the spectral modeling approach discussed earlier, the use of neural network driven by experimentally determined features in the real human auditory system makes them distinguishable. Often, the spectral modeling approaches adopt the cochlea filter bank as part of the front-end processing and the neural network model also use the mid-level acoustic groupings but incorporated into the network.

The strength of this approach is its close relation to how we, human, actually do it and we should be satisfied to have a computer do what is relevant to what we are hearing, ignoring the inaudible aspects. The drawback is obviously the level of difficulty in the study especially in the brain level.

In  [2],  [5], varios auditory maps of input sounds are integrated to segregate speech from the input mixture. The acoustic attributes used are onset, offset, AM, FM and formants. A black-board architecture is used to simplify the process in  [3]. A network of neural oscillators have also been used in various works at Sheffield, UK.

Malcolm Slaney has done a great deal of work in this area too. One of the model was the system having cochlea filterbank as a front-end. The energy of each channel is calculated to adapt the gains, simulating the sensitivity adaptation. A half-wave rectifier is then used to model the firing rate of the neurons at each position of the cochlea. The inversion of the cochleagram into sound is also given. A correlogram is calculated from many instances of STFT. It can be inverted into a cochleagram before conversion into time signal.


next up previous
Next: Statistical Learning Up: Computational Auditory Scene Analysis Previous: Non-stationary model

Pamornpol Jinachitra
Tue Jun 17 16:27:28 PDT 2003