Database Reference
In-Depth Information
front , using the same, simple, transparent procedure (4.1.1) to ensure historical
accuracy and to conform to the structure of a content-addressable memory.
For correcting stored content, a comment is written to the now front , like
a diary entry noting a change in temperature (cf. 4.4.2 for an analogous ex-
ample). The comment refers to the content by means of addresses, providing
instant access. When subactivation lights up a content, any and all addresses
pointing at it are activated as well. When subactivation lights up an address,
the original and all the other addresses pointing at it are also subactivated. In
short, originals and their addresses are systematically co-subactivated in DBS.
12.2 RMD Corpus
While the core values, the semantic relations, and the levels of abstraction are
agent-internal constructs of cognition, the language data are agent-external
objects, collected, for example, as a corpus. Like any contemporary linguistic
theory, DBS requires corpora to obtain frequency distributions of various kinds
in a standardized framework. For example, when expanding automatic word
form recognition, recognition rates will improve best if the most frequent word
forms are integrated into the software first, and similarly for parsing syntactic-
semantic constructions, and so on.
The frequency information should be obtained from a standardized RMD
corpus, i.e., a Reference Monitor corpus structured into Domains. The refer-
ence corpus consists of a subcorpus for everyday language, complemented by
subcorpora for different domains such as law, medicine, physics, sport, poli-
tics (cf. von der GrĂ¼n 1998), including fiction, e.g., movie scripts. Their sizes
may be determined by methods evolved from those used for the Brown corpus
(Kucera and Francis 1967, Francis and Kucera 1982).
The reference corpus is continued with monitor corpora following every year
(Sinclair 1991, p. 24-26). The annual monitor corpora resemble the reference
corpus in every way: overall size, choice of domains, domain sizes, etc. The
reference corpus and the monitor corpora use texts from a carefully selected
set of renewable language data: newspapers for everyday language, established
journals for specific domains, and a selection of fiction which appeared in the
year in question.
Most of the corpus building and analysis may be done completely automati-
cally. This holds (i) for the collecting of texts for the monitor corpora once the
initial set of sources has been settled on, (ii) for the statistical analysis once a
useful routine as been established. and (iii) for automatic word form recogni-
tion as well as syntactic-semantic parsing. Such automatic corpus processing
Search WWH ::




Custom Search