Database Reference
In-Depth Information
replaces markup by hand, ensuring the quality of standardization necessary
for meaningful comparisons, and saving the labor of instructing and the cost
of remunerating large groups of markup personnel. 7
A succession of monitor corpora allows a detailed view of how the language
and the culture are developing, in different domains and over many decades.
Statistical analysis will show, for example, how politics and natural disasters
cause a temporary frequency increase of certain words in certain domains.
A carefully built long-term RDM corpus is in the interest of the whole lan-
guage community and should be entrusted to the care of a national academy.
This would secure the necessary long-term funding, though much of the cost
could be recovered from commercial use of the continuously upgraded RDM
corpus of the natural language in question (Sect. 12.6).
In DBS, the routine of analyzing a corpus begins with running the corpus
through automatic word form recognition. 8 The result is a set of proplets called
a content and stored in the content-addressable database of a Word Bank. Next,
semantic relations are established between proplets by means of syntactic-
semantic parsing. Finally, LA-think, inferences, and LA-speak are added.
Just as there is no limit to the amount of content stored in a Word Bank, at
least in principle, there is no limit to the amount of information that can be
added to the content. The information is integrated as a system of footnotes
and subfootnotes. The “footnotes” are realized as interpreted pointers to other
contents in the Word Bank. This does not increase the number of proplets in
the Word Bank, only the number of addresses connecting them.
The user may query the Word Bank content in natural language (provided
that the language software is available). Once an LA-think and an LA-speak
grammar have been added, the answers may be in the natural language of the
query. This method, though developed with carefully constructed input sen-
tences, may eventually be applied to free text such as pages in the Internet.
One benefit would be a quality of recall and precision unachievable by a sta-
tistical approach (cf. FoCL'99, Sects. 15.4, 15.5) or by manual markup.
12.3 Evolution
A computational model of natural language communication, defined at a level
of abstraction which applies to natural and artificial agents alike, need not
necessarily include the dimension of evolution. Instead, the software machine
could be built as a purely “synchronic” framework of computational function.
7 See Sect. 8.5 for the use of a corpus for the purpose of search space reduction in DBS.
8 The usual preprocessing assumed.
Search WWH ::




Custom Search