Biomedical Engineering Reference
In-Depth Information
3.3.1
Phrase Completion and Sentence Continuation
This discussion of language cognition begins with consideration of a class of confabulation
architectures for dealing with single English sentences. These architectures address the problems
of phrase completion and sentence continuation ; simple subcases of language generation .This
subsection expands upon on the brief introduction to phrase completion provided in Hecht-Nielsen
(2005). These architectures provide a good introduction to the ''look and feel'' of cognitive
information processing — which is completely different than the familiar computer paradigm.
Figure 3.1 illustrates a confabulation architecture for phrase completion and sentence contin-
uation in a single sentence of up to 20 words. Each lexicon has about 63,000 symbols; including
symbols for the 63,000 most common words in English (as reflected in the training corpus) and
eight punctuations (period, comma, semicolon, etc.), which are treated as separate words. Capital
letters are used when they appear in words in the training corpus selected for representation within
the word lexicons (i.e., mark and Mark are different words with different symbols). Thus, many
of the words in the lexicon are represented twice — once capitalized and once not; some have
even more than two representations, e.g., EXIT , Exit , and exit ; and some, such as e.g. , and the
punctuations are never capitalized and only have one representation.
Once a suitably ''clean'' huge proper English text training corpus (typically containing billions
of words) has been created, each successive sentence in the corpus is entered , in sequence, into the
architecture of Figure 3.1. The first word of the sentence is entered into the leftmost lexicon (i.e.,
the symbol representing this word is made active) and the remaining words of the sentence (or
punctuations — which, again, are treated as separate words) are entered successively until
the ending period. If the sentence has more than 20 words, those words beyond the first 20 are
discarded. Because of the positioning of the words of each sentence in order, this architecture is
termed position-dependent .
It is also possible to use hierarchical ring architectures for representing strings of words; which
I believe is probably how the human cortical language architecture is organized. As the words are
loaded into the ring of lexicons, they are quickly removed in groups (phrases) and re-represented in
lexicons at a higher conceptual level — leaving the lower-level lexicons free for capturing
additional words. I believe that this is why humans can only instantly remember ''about 7 things
+ 2'' (Miller, 1956) — we physically only have about seven lexicons at the word level. When
required to remember a sequence of things, we repeatedly rehearse the sequence (to firmly store it in
short-term memory) by traversing the ring from the beginning lexicon (which is always the same
one for each sentence or word sequence) to the last item and then back to the beginning. However,
given the lack of limitations of computer implementations of confabulation architectures (at least
conceptually), there is no need for us to use these more complicated ring architectures for this
chapter's introductory discussion.
The knowledge bases of the architecture of Figure 3.1 are all causal ; meaning that the symbols
of each lexicon are only linked to symbols of later lexicons (i.e., those that lie to the right of it);
Figure 3.1 Na¨ve single-sentence confabulation architecture for proper English phrase completion or sentence
continuation. Knowledge bases link each of the first 19 of the 20 lexicons to all of the lexicons to their right.
Sentences are represented with the first word in the first lexicon on the left; and so on in sequence. This architecture
has a total of 19 þ 18 þ ... þ 1 ¼ 190 knowledge bases.
Search WWH ::




Custom Search