Information Technology Reference
In-Depth Information
Fig. 1.5 The web demonstrator for visualizing automatically extracted quotations. It allows to insert
arbitrary text into the upper field which is then analyzed by our quotation extraction component. The
results are highlighted in the area below . The users can choose which information the demonstrator
should display and which should be hidden
1.3.6 Conclusion
We presented an approach to quotation extraction that includes the extraction of
direct and indirect quotations and the assignment of a speaker to each quotation.
Our approach is rule-based and relies on a handcrafted list of reporting verbs. The
implemented rules are manually created as well. As valid speakers we allow text
spans covering pronouns (she and he), noun phrases, and named entities. We resolve
pronouns to appropriate named entities mentioned earlier in the text. The results
achieved with our unsupervised approach compare favorably with other approaches,
and do not rely on the availability of training data. Especially the extraction of
direct quotations and attributing them to a speaker already works very satisfactorily.
Regarding indirect quotations, finding the boundaries of the reported clause and the
correct quotation holder is more challenging, regarding mixed quotations even more.
Since our approach to indirect quotation extraction relies on a list of reporting verbs
and clues, potential results are limited to those parts near such predefined reporting
verbs or clues. Thus, our future work includes among others, the extension of our
approach by an automatic reporting verb recognizer [ 38 ]. We plan to detect previ-
ously unseen reporting verbs as well as the disambiguation of verbs. For example,
the ambiguous verb “to add” may lead to a mistake by regarding it as a reporting
Search WWH ::




Custom Search