Information Technology Reference
In-Depth Information
Fig. 1.2 The structure of quotations. A quotation is composed of a reporting and a reported clause.
The reporting clause introduces the quotation. It includes the quotation speaker and an optional
reporting verb. The reported clause encompasses the quoted content/text
also contain a reporting verb such as “sagte” (said) or “berichtete” (reported) and
other circumstantial information such as the addressee or diverse other descriptive
text. The reported clause encompasses the actual content that has been said.
Direct speech repeats things that have been said by a speaker as they are without
any modifications. The repeated text is enclosed by quotation marks. In contrast to
direct speech, indirect speech reports statements by modifying them grammatically
or even rephrasing them. The grammatical change indicates that the expression was
not uttered by the author, but by the original speaker. Indirect speech is composed
of a main (reporting) and a subordinate (reported) clause. In German, the reported
clause of an indirect speech often is introduced by the conjunction “dass” and uses
the subjunctive mood for verbs. Quotations may consist of several reported clauses
and we define quotations as mixed if both, quoted and unquoted reported clauses
build the quotation.
Our Contribution . In our work we process German news articles and extract
direct and indirect quotations along with a quotation speaker. We propose a rule-
based approach that exploits linguistic information. Modeled as a processing pipeline
our quotation extraction component first enriches news articles with linguistic anno-
tations, which then are used to mine the complete quotation. We detect units of direct
quotes by applying a pattern that takes into consideration different types of quotation
marks. We exploit the presence of reporting verbs and other common phrases indi-
cating quotations to underpin direct quotation candidates found by our pattern and
to locate potential indirect quotations. In order to assign each quotation a speaker we
make use of the output generated by a named entity recognizer and a part-of-speech
tagger. We compile a list of candidate speakers and then apply rules that consider
the type of the candidates and the proximity to the reporting verb to determine the
quotation speaker. For evaluating our approach we manually created a quotation
corpus from a set of German news articles. The corpus provides for each quotation
its boundaries, the quotation speaker, a reporting verb or phrase, and the type of
the quotation (direct, indirect or mixed). The corpus is available upon request and
signing of an agreement.
Search WWH ::




Custom Search