Database Reference
In-Depth Information
Improvements from one version to the next may be achieved rather eas-
ily because there are large fields of empirical data which merely need to be
“harvested.” The software machine for the systematic collection, analysis, and
interpretation of the language data is the DBS robot, originally designed to
model the mechanism of natural language communication.
For example, when applied to a new language, the DBS robot's off-the-
shelf components for the lexicon, automatic word form recognition, syntactic-
semantic parsing, and so on, hold no language-dependent data. As a new lan-
guage is being analyzed, words are added to the robot's lexicon component,
just as compositional structures are added to the LA-Morph, LA-hear, LA-
think, and LA-speak grammars in the robot's rule component. Also, culture-
dependent content may be added to the Word Bank.
Storing the analysis of a natural language directly in the DBS robot makes
the analysis available right away for computational testing by the scientists
and for computational applications by the users. This works not only for the
hear mode, as in testing on a corpus, but for the full cycle of natural language
communication. The testing is designed (i) to automatically enhance the robots
performance by learning, and (ii) to provide the scientists with insights for
improving the robot's learning abilities.
For long-term linguistic research, there is no lack of renewable language
data, namely (i) the natural changes year to year within the domains of a given
language and (ii) a wide, constantly extending range of applications in human-
machine communication. In addition, there is (iii) the great number of natural
languages not yet charted, or not yet charted completely (including English, in
any theory). The harvesting of each of these kinds of data will be of interest to
its own group of users.
Charting a new natural language is a standard procedure, but it has to deal
with relatively large amounts of data. As more and more languages are an-
alyzed, however, charting is accelerated because software constructs may be
reused, based on similarities in lexicalization, in productive syntactic-semantic
structures, in collocations, constructions, and idioms, and in inferencing. To
better support day-to-day research, 36 these standardized software constructs
and their declarative specifications may be stored in system libraries, orga-
nized for families of languages.
Search WWH ::




Custom Search