Information Technology Reference
In-Depth Information
Fig. 1.4
The annotation tool used for the creation of the quotation extraction corpus
We asked the annotators to identify all quotations in a news article and advised
them to mark for each quotation the quoted text, the quotation holder, and a reporting
verb if available. A screenshot of the annotation tool is shown in Fig. 1.4 . For quota-
tion holders not referenced by their proper name but by, e.g., a personal pronoun or
only by the last name, the annotators should assign the full proper name if possible.
If a quotation or a reporting verb was composed of several parts, the annotators were
asked to mark all parts ( teilte der Sprecher mit , the spokesman said ). They were also
advised to mark if a news article does not contain any quotes at all.
We succeeded in annotating 714 news articles. 339 of the news articles were
annotated twice, 27 three times, and 2 even four times. The remaining 347 news
articles were annotated by only one annotator. The annotators exactly agreed upon
the quotations in 287 news articles. At that point we speak of exact agreement if
the boundaries of the quotation holder, the reporting verb, and the quote text match
accurately comparing the annotated tokens. Finally, the resulting corpus of 287 news
articles contains 383 quotations, whereof 256 quotations are direct, 98 indirect, and 29
mixed (including at least a direct and indirect part) quotations. A news article contains
1.3 quotations in average. 87 % of the quotations are attributed with a reporting verb.
We succeeded in annotating a quotation holder for each quotation. For 202 quotation
holders we could resolve the reference and assign proper names.
1.3.5 Evaluation
We evaluate our quotation extraction approach using a human-annotated corpus of
287 news articles where at least two annotators exactly agreed upon the contained
Search WWH ::




Custom Search