How Computers Communicate - Robots Unlimited: Life in a Virtual Age

Robotics Reference

In-Depth Information

Navy. LIFER employed a semantic grammar, i.e., one that used labels

such as “SHIP” and “ATTRIBUTE” rather than syntactic labels such as

noun and verb. This meant that it was closely tied to its own domain in

the same way as SHRDLU was, but with the important difference that it

was much more user-friendly, allowing the user to define new dictionary

entries, to define paraphrases and to process incomplete input.

Up to the early 1980s there was a tendency within the NLP com-

munity for each research project to focus on only a single microcosm of

the overall problem of understanding, with little co-ordination between

these project groups and therefore little visible progress on the overall

task. Then came the realisation that a more global approach was nec-

essary, followed by a step very much in the right direction when two

electronically accessible corpora of English text became available, one for

American English, collected at Brown University in Rhode Island, the

other, for British English, managed by a consortium in Europe. 8 Both

corpora consisted of approximately one million words, spread roughly

evenly across some 500 texts. They had in fact been compiled somewhat

earlier but it was not until the beginning of the 1980s that sufficient

computing power became widely available to NLP researchers for these

texts to be easily usable electronically.

The availability of these corpora allowed researchers to develop the

first statistically based techniques for use in NLP, that is to say, tech-

niques based on the relative frequencies of certain properties of natural

language. As a simple example of such techniques let us return to the de-

finition of the word “mother”—if a program encounters the word during

the course of its semantic analysis of a sentence, in the absence of any

other knowledge the program will be able to make an intelligent guess

that the common, human-being meaning is the intended one, rather

than the alternative relating to the manufacture of vinegar. Such a guess

would be made by the program looking up the relative frequencies of the

two meanings across a large corpus of English text. Another widely used

statistical application of corpora relates to tagging words with the appro-

priate part-of-speech during the syntactic analysis process, by knowing,

for example, that when the word “wood” or “woods” is within close prox-

imity, the word “bear” is much more frequently used to mean a big furry

animal (a noun) than to mean “carry” (a verb). By the close of the twen-

8 This corpus started life at the University of Lancaster and then moved to Oslo University and

the Norwegian Computing Centre for the Humanities at Bergen.

Search WWH ::

Custom Search

Home