How Computers Communicate - Robots Unlimited: Life in a Virtual Age

Robotics Reference

In-Depth Information

tieth century, some statistical techniques for part-of-speech tagging were

achieving scores greater than 95 percent in accuracy, which is close to

human performance.

Partly because of the availability of electronic corpora and the sta-

tistics that can be derived from them, and partly because of the other

benefits of using faster computers with bigger memories, the 1990s saw a

dramatic increase in NLP research based on empirical evidence. Such ev-

idence includes the data in the Penn Treebank at the University of Penn-

sylvania, a “bank” of linguistic “trees”, not unlike the parse tree shown

in Figure 51, page 248 (though mostly considerably larger than that ex-

ample). In the Penn Treebank 40,000 sentences from the Wall Street

Journal have been annotated according to their linguistic structure, pro-

ducing both part-of-speech tags and parses that show rough syntactic and

semantic information. By comparing a given sentence with each of the

sentences in the Penn Treebank, a program can identify the closest match

and then make reasonable assumptions about the syntactic structure and

meaning of the given sentence based on the known structure and mean-

ing of the closest match sentence.

Another useful electronic resource for NLP researchers is the Word-

Net lexical database developed over a period of 20 years at Princeton

University, starting in 1985, under the guidance of George Miller. Word-

Net is one of the most powerful electronic research tools available to the

NLP community, having a design based on current psycholinguistic the-

ories of how human lexical memory works. WordNet collates nouns,

verbs, adjectives and adverbs into sets of synonyms, each set representing

one underlying lexical concept, and different relations link the synonym

sets enabling programs to discover useful semantic relationships between

words. In addition to the original English language version there is now

a multilingual version, called EuroWordNet, for several European lan-

guages: Dutch, Italian, Spanish, German, French, Czech and Estonian.

Passing the Turing Test

In 1990 Hugh Loebner undertook to The Cambridge 9 Center for Be-

havioral Studies that he would underwrite a contest designed to imple-

ment the Turing Test. Each year Loebner donates a $2,000 prize and a

bronze medal to the winner of a competition that has become regarded

as the world championship for conversational programs. The ultimate

9 Cambridge, MA, not Cambridge, England.

Search WWH ::

Custom Search

Home