Database Reference
In-Depth Information
6.6.2 T
URNING CONTEXT PROPLETS INTO LANGUAGE PROPLETS
proplet shell
language proplets
⎡
⎣
⎤
⎦
⎡
⎣
⎤
⎦
⎡
⎣
⎤
⎦
⎡
⎣
⎤
⎦
⎡
⎣
⎤
⎦
sur:
α
'+x
noun:
α
cat: pn
sem: count pl
fnc:
mdr:
prn:
sur: dog+s
noun: dog
cat: pn
sem: count pl
fnc:
mdr:
prn:
sur: book+s
noun: book
cat: pn
sem: count pl
fnc:
mdr:
prn:
sur: child+ren
noun: child
cat: pn
sem: count pl
fnc:
mdr:
prn:
sur: apple+s
noun: apple
cat: pn
sem: count pl
fnc:
mdr:
prn:
⇒
Assuming that the context proplets in 6.6.1 have been acquired already, learn-
ing the associated language proplets involves only a single value, namely that
of the
sur
attribute, again facilitating learning.
Once the proplets have been acquired for one language, they may be reused
for other languages, provided the lexicalization is similar.
25
The following ex-
ample shows the proplets for the concept
dog
with English, French, German,
and Italian surfaces:
6.6.3 T
AKING SUR VALUES FROM DIFFERENT LANGUAGES
proplet shell
language proplets
⎡
⎣
⎤
⎦
⎡
⎣
⎤
⎦
⎡
⎣
⎤
⎦
⎡
⎣
⎤
⎦
⎡
⎣
⎤
⎦
sur:
'
noun:
α
cat: sn
sem: count sg
fnc:
mdr:
prn:
α
sur: dog
noun: dog
cat: sn
sem: count sg
fnc:
mdr:
prn:
sur: chien
noun: dog
cat: sn
sem: count sg
fnc:
mdr:
prn:
sur: Hund
noun: dog
cat: sn
sem: count sg
fnc:
mdr:
prn:
sur: cane
noun: dog
cat: sn
sem: count sg
fnc:
mdr:
prn:
⇒
For syntactic-semantic parsing, the French, German, and Italian proplet ver-
sions will have to be complemented with the additional
cat
value
m
(for the
grammatical gender masculine). This language-dependent information may
be obtained from the traditional dictionaries for these languages. In addition,
corpus-based information, such as domain-dependent frequency, LA-hear pre-
decessors and successors ordered according to frequency (n-grams), semantic
relations, etc., may be added to the owner proplets (Sect. 8.5).
26
25
Cf. 3.6.1; other examples of different lexicalizations are (i) German
Traumreise
(literally
dream
journey
), which has been translated into American English as
dream vacation
and into French as
voyage des rêves
, (ii) English
horseshoe
, which translates into German as
Hufeisen
(literally
hoof iron
) and into French as
fer à cheval
(literally
iron for horse
), and (iii) French
ralenti
,which
translates into English as
slow motion
and into German as
Zeitlupe
.
26
Automatic word form recognition (based on a lexicon and rules) provides a more accurate frequency
analysis of a corpus, for example, than part-of-speech tagging (based on statistical transition likeli-
hoods from one word form to the next in a corpus). Unlike automatic word form recognition, part-of-
speech tagging does not relate surface values such as
learn, learns, learned, learning
,and
swim,
swims, swam, swum, swimming
, to their base forms (core values), i.e.,
learn
and
swim
, respec-
tively. Therefore, the rule-based and the statistical approach lead to substantially different frequency
distribution results. For an evaluation of the CLAWS4 tagging analysis of the Britisch National Cor-
pus (BNC), see FoCL'99, Sect. 5.5.
Search WWH ::
Custom Search