Information Technology Reference
In-Depth Information
the similarity to all existing centroids is lower than a predefined threshold then the
incoming news article starts a new topic. The similarity measure is a combination of
the cosine similarity between two vectors and a time-dependent penalty. Including a
time-dependent penalty favors the assignment of news articles to new topics and at
the same time avoids an infinite growth of existing topics [ 20 ].
1.2.3 Dynamic Topic Hierarchy
Classical news aggregators organize news articles, and the topic they belong to, in top-
level categories like “Politics”, “Economy”, or “Sports”. In order to better navigate
and track the development of related news stories, news stories may be organized in
a hierarchy. The arrangement of topics in a hierarchy links not directly related topics
and supports readers to recognize relationships between them. This is especially
helpful in the context of searching news archives and recommending related news
articles for further reading. Since news stories evolve over time and new events
happen continuously we do not sort the news articles in predefined hierarchies such
as the Metadata Taxonomies for News by the International Press Telecommunications
Council 8 (IPTC) but propose the creation of a dynamic topic hierarchy arising from
the current news situation. Based on previously detected topics (Sect. 1.2.2 ) we build
thematically connected meta-topics and assign labels to them. We select the most
probable headline from the set of news articles belonging to the meta-topic.
1.2.4 Quotation Extraction
Quotations are a common stylistic device to clarify and strengthen a statement.
Basically, a distinction is made between direct and reported speech. Considering
quotations in news articles, they underline reported facts and may express positions
or views of the cited persons or organizations. By employing quotations at specific
points in an article the author highlights statements that are especially significant and
worth to be cited. In addition, quotations may be a suitable source for identifying
subjective passages of a news article. In our system we apply a rule-based approach
to quotation extraction. We address the extraction of direct and reported speech and
assign a speaker to each identified quotation. Our solution normalizes quotation
marks, makes use of linguistic annotations to detect reporting verbs or phrases that
introduce quotations, the boundaries of direct and reported quotation parts, and finally
the speaker, which we also call quotation holder in the following. We describe our
approach to quotation extraction in detail in Sect. 1.3 .
8
http://www.iptc.org/site/NewsCodes/ .
Search WWH ::




Custom Search