Information Technology Reference
In-Depth Information
Ta b l e 3 . Inter-annotator agreement comparison between studies dealing with relations between
composite nouns of noun compounds
Study Agreement Index No. of Relations
[26]
0.57 - 0.67 κ
43
[9]
0.61 κ
22
[23]
0.68 κ
6
[14]
52.31 %
20
[10]
0.58 κ
21
consistent misclassifications by the author, however no such patterns were found. Some
of the classifications seemed to use the FROM type as a “fall back” category.
In order to compare the inter-annotator agreement with other similar studies we also
computed the Fleiss' κ measure. The κ index for the overall annotation tasks was com-
puted to be 0.64 and the value with HAVE, MAKE and USE conflated was 0.86. The
overall κ value 0.64 compares well with the inter-annotator figures from other annota-
tion experiments dealing with identification of relations. For comparison some of the
results are summarized in table 3.
5
Prevalence of Anaphoric Relations in News Articles
In order to gauge the prevalence of the anaphoric relations in naturally occurring dis-
courses, we used 30 of the 120 newspaper articles used for the annotation experiment
and analyzed them in detail to determine the existence and the distribution of the pro-
posed relations. The set of 30 articles consisted of 352 sentences and 2323 nouns. The
30 randomly chosen articles were analyzed by the author for the existence of the 10
relations from table 1. In addition there were two additional relation types. The first one
was the OTHER relation for relations that could not be categorized into any of the 10
types from table 1. The second one was the BE-OCCR relation which was considered
trivial, hence was not tested in the annotation experiment. This represents the identity
or the co-reference relation.
Out of a total of 2323 nouns, 1324, or 57% were found to be used anaphorically.
This shows that more than half of the nouns used were anaphoric, hence highlights the
importance of being able to resolve them for discourse interpretation. Note that in our
framework, an anaphor can have more than one antecedent where the antecedents are
related by different relations. The 1324 nouns used anaphorically had a total of 1588
relations between them. This gives us an average of 1.2 relations per anaphor. The de-
tailed distribution of the relation types are shown in figure 1. The figure firstly shows
that the majority (524) of the relations are of type BE-OCCR which are identity rela-
tions represented by both pronouns as well as noun phrases. The reset of the relations
were fairly evenly distributed ranging from 64 to 175. Only a small number, 32 or 2%
of the relations were found to be outside the range of the relations in the framework.
Aside from the BE-OCCR and OTHER relation types, there were 1032 bridging relation
from the list BE-INST, CAUSE, HAVE, MAKE, USE, IN, FOR, FROM, ABOUT and
ACTION. This means a substantial proportion (65%) of relations were bridging, high-
lighting their prevalence in news paper articles. We are in the process of implementing
 
Search WWH ::




Custom Search