Mechanization of Cognition - Biomimetics: Biologically Inspired Technologies

Biomedical Engineering Reference

In-Depth Information

Figure 3.6). Mathematically, the symbols of each primary sound lexicon are a vector quantizer

(Zador, 1963) for the set of S vectors that arise, from all sound sources that are likely to occur,

when each source is presented in isolation (i.e., no mixtures). Among the symbol sets that are

responding to S are some that represent the sounds coming from the attended speaker. This

illustrates the critically important need to design the acoustic front-end so as to achieve this sort

of quasiorthogonalization of sources . By confining each sound feature to a properly selected time

interval (a subinterval of the 8000 samples available at each moment, ending at the most

recent 16 kHz sample), and by using the proper postfiltering (after the dot product with the feature

vector has been computed) this quasiorthogonalization can be accomplished. (Note: This scheme

answers the question of how brains carry out ''independent component analysis'' [Hyv¨rinen et al.,

2001]. They don't need to. Properly designed quasiorthogonalizing features, adapted to the pure

sound sources that the critter encounters in the real world, map each source of an arbitrary mixture

of sources into its own separate components of the S vector. In effect, this is essentially a sort of

''one-time ICA'' feature development process carried out during development and then essentially

frozen (or perhaps adaptively maintained). Given the stream of S vectors, the confabulation

processing which follows (as described below) can then, at each moment, ignore all but the attended

source-related subset of components, independent of how many, or few, interfering sources

are present. Of course, this is exactly what is observed in mammalian audition — effortless

segmentation of the attended source at the very first stage of auditory (or visual or somatosensory,

etc.) perception.

The expectation formed on the next-word acoustic lexicon of Figure 3.7 (which is a huge

structure, almost surely implemented in the human brain by a number of physically separate

lexicons) is created by successive C1Fs . The first is based on input from the speaker model

lexicon. The only symbols (each representing a stored acoustic model for a single word — see

below) that then remain available for further use are those connected with the speaker currently

being attended to.

The second C1F is executed in connection with input from the language module word lexicon

that has an expectation on it representing possible predictions of the next word that the speaker will

produce (this next-word lexicon expectation is produced using essentially the same process as was

described in Section 3.3 in connection with sentence continuation with context). (Note: This is an

example of the situation mentioned above and in the Appendix, where an expectation is allowed to

transmit through a knowledge base.) After this operation, the only symbols left available for use on

the next-word acoustic lexicon are those representing expected words spoken by the attended

speaker. This expectation is then used for the processing involved in recognizing the attended

speaker's next word.

As shown in Figure 3.7, knowledge bases have previously been established (using pure source,

or well-segmented source, examples) to and from the primary sound symbol lexicons with the

sound phrase lexicons and to and from these with the next-word acoustic lexicon. Using these

knowledge bases, the expectation on the next-word acoustic lexicon is transferred (as described

immediately above) via the appropriate knowledge bases, to the sound phrase lexicons, where

expectations are formed; and from these to the primary sound lexicons, where additional expect-

ations are formed. It is easy to imagine that, since each of these transferred expectations is typically

much larger than the one from which it came, that by the time this process gets to the primary sound

lexicons, the expectations will encompass almost every symbol. THIS IS NOT SO! While these

primary lexicon expectations are indeed large (they may encompass many hundreds of symbols),

they are still only a small fraction of the total set of tens of thousands of symbols. Given these

transfers, which actually occur as soon as the recognition of the previous word is completed —

which is often long before its acoustic content ceases arriving, the architecture is prepared for

detecting the next word spoken by the attended speaker.

Biomimetics: Biologically Inspired Technologies

Search WWH ::

Custom Search

Home