Robotics Reference
In-Depth Information
Navy. LIFER employed a semantic grammar, i.e., one that used labels
such as “SHIP” and “ATTRIBUTE” rather than syntactic labels such as
noun and verb. This meant that it was closely tied to its own domain in
the same way as SHRDLU was, but with the important difference that it
was much more user-friendly, allowing the user to define new dictionary
entries, to define paraphrases and to process incomplete input.
Up to the early 1980s there was a tendency within the NLP com-
munity for each research project to focus on only a single microcosm of
the overall problem of understanding, with little co-ordination between
these project groups and therefore little visible progress on the overall
task. Then came the realisation that a more global approach was nec-
essary, followed by a step very much in the right direction when two
electronically accessible corpora of English text became available, one for
American English, collected at Brown University in Rhode Island, the
other, for British English, managed by a consortium in Europe. 8 Both
corpora consisted of approximately one million words, spread roughly
evenly across some 500 texts. They had in fact been compiled somewhat
earlier but it was not until the beginning of the 1980s that sufficient
computing power became widely available to NLP researchers for these
texts to be easily usable electronically.
The availability of these corpora allowed researchers to develop the
first statistically based techniques for use in NLP, that is to say, tech-
niques based on the relative frequencies of certain properties of natural
language. As a simple example of such techniques let us return to the de-
finition of the word “mother”—if a program encounters the word during
the course of its semantic analysis of a sentence, in the absence of any
other knowledge the program will be able to make an intelligent guess
that the common, human-being meaning is the intended one, rather
than the alternative relating to the manufacture of vinegar. Such a guess
would be made by the program looking up the relative frequencies of the
two meanings across a large corpus of English text. Another widely used
statistical application of corpora relates to tagging words with the appro-
priate part-of-speech during the syntactic analysis process, by knowing,
for example, that when the word “wood” or “woods” is within close prox-
imity, the word “bear” is much more frequently used to mean a big furry
animal (a noun) than to mean “carry” (a verb). By the close of the twen-
8 This corpus started life at the University of Lancaster and then moved to Oslo University and
the Norwegian Computing Centre for the Humanities at Bergen.
Search WWH ::




Custom Search