Information Technology Reference
In-Depth Information
4.2
Annotation Data
Our base input data used for content analysis for all aspects of NP usage consisted
of 120 articles (of mixed genre) from The New Zealand Herald , The Dominion Post
and The Press which are three major online newspapers from three different cities in
New Zealand. The choice of the articles were not completely random. This corpora was
developed to serve as the input data for the anaphora resolution system which is the
parent project of this study. Hence, the corpora was developed from the articles which
were not too short (had more then 20 sentences), exhibited use of a variety of anaphoric
uses (including pronominal anaphora) and had been written by different writers.
An inherent challenge in most NLP tasks is what is referred to as data sparseness .
The term is used to describe a characteristic when a single chosen corpus cannot be used
for consistent empirical validation of all aspects of a theory. This is because the preva-
lence of the different characteristics of an NLP theory can be unevenly distributed in a
fixed corpus. Hence, we searched an extended corpus in order to make a lower thresh-
old of 15 relations from each category. For this we used The Corpus of Contemporary
American English [5]. This freely available corpora consisting of some 410 million
words from a variety of genre and has an online web interface which can be used to do
fairly complex searches for words and phrases hence forms an excellent resource for
manual content analysis for NLP tasks.
We excluded validating the BE-OCCR relation since this is a non-ambiguous co-
reference relation.
For the annotation experiment we used 3 streams of approximately 30 students giving
us a total of 90 different annotators. Each annotator took 4 different tasks, one per
week for a period of 4 weeks. Each task consisted of 25 antecedent-anaphor pairs and
was annotated by 2 streams, ie. approx. 60 annotators. We randomly discarded some
annotation task sheets in order to have a consistent number of annotations for each
pair resulting in 25 annotators for each task. Each relation type from (CAUSE, HAVE,
MAKE, USE, BE-INST, IN, FOR, FROM, ABOUT, ACTION) as classified by the
author was represented by 15 anaphor-antecedent pairs. The pairs from each of the 10
relation types were randomly selected to make up 6 task sheets, each consisting of 25
pairs. The total number of classifications for all relations amounted to 3750 with 375
classifications for each relation type consisting of 15 different anaphor-antecedent pairs.
Each of the streams were given a basic training on semantic interpretation of the rela-
tion types using the examples in section 3. These examples were also given as a separate
sheet with each annotation task. Each task sheet consisted of anaphor-antecedent pairs
and a tick box for each of the relations. The annotators were asked to choose the rela-
tion which best describes the anaphor-antecedent pair. Two additional options, OTHER
and NONE were also given. The OTHER was to be used if the annotator thought that a
relation does exist but is not present in the given list and option NONE to be used if the
annotator thought that the pair were not related at all.
Table 1 shows the confusion matrix of the relation types as identified by the anno-
tators against the author's classification. Table 2 shows the corresponding confusion
indices between the relation types. The confusion indices indicate the likelihood of a
relation type to be interpreted as another type.
 
Search WWH ::




Custom Search