Conclusion - Computational Linguistics and Talking Robots

Database Reference

In-Depth Information

Improvements from one version to the next may be achieved rather eas-

ily because there are large fields of empirical data which merely need to be

“harvested.” The software machine for the systematic collection, analysis, and

interpretation of the language data is the DBS robot, originally designed to

model the mechanism of natural language communication.

For example, when applied to a new language, the DBS robot's off-the-

shelf components for the lexicon, automatic word form recognition, syntactic-

semantic parsing, and so on, hold no language-dependent data. As a new lan-

guage is being analyzed, words are added to the robot's lexicon component,

just as compositional structures are added to the LA-Morph, LA-hear, LA-

think, and LA-speak grammars in the robot's rule component. Also, culture-

dependent content may be added to the Word Bank.

Storing the analysis of a natural language directly in the DBS robot makes

the analysis available right away for computational testing by the scientists

and for computational applications by the users. This works not only for the

hear mode, as in testing on a corpus, but for the full cycle of natural language

communication. The testing is designed (i) to automatically enhance the robots

performance by learning, and (ii) to provide the scientists with insights for

improving the robot's learning abilities.

For long-term linguistic research, there is no lack of renewable language

data, namely (i) the natural changes year to year within the domains of a given

language and (ii) a wide, constantly extending range of applications in human-

machine communication. In addition, there is (iii) the great number of natural

languages not yet charted, or not yet charted completely (including English, in

any theory). The harvesting of each of these kinds of data will be of interest to

its own group of users.

Charting a new natural language is a standard procedure, but it has to deal

with relatively large amounts of data. As more and more languages are an-

alyzed, however, charting is accelerated because software constructs may be

reused, based on similarities in lexicalization, in productive syntactic-semantic

structures, in collocations, constructions, and idioms, and in inferencing. To

better support day-to-day research, 36 these standardized software constructs

and their declarative specifications may be stored in system libraries, orga-

nized for families of languages.