Information Technology Reference
In-Depth Information
articles but also examines the content of the articles in order to grasp their meaning.
Besides TDT at topic-level, the proposed system recognizes more abstract topics and
performs quotation extraction and sentiment analysis based on the identified quota-
tions. The system is capable of offering deep insight into single events and topics
by highlighting named entities along with direct and indirect quotations. The users
may inform themselves about involved entities, compare their comments, and learn
about the perception of the entities and topics in the media landscape.
The proposed system was developed in close collaboration with Neofonie GmbH. 6
It is modeled with a processing pipeline as the central component. The system's
structure is schematically outlined in Fig. 1.1 . When going through the processing
pipeline, the documents are enriched with more and more information. For each
crawled news article a linguistic preprocessing is performed. The news articles are
split into tokens and sentences and annotated with part-of-speech tags, named enti-
ties and lemmas (Sect. 1.2.1 ). Subsequently, the news articles are mined. On the
Processing Pipeline
News Crawling
News Deduplication
Preprocessing
Tokenization
Sentence Splitting
Lemmatizing
POS Tagging
NER
News Mining
Quotation Extraction
Dynamic Topic Hierarchy
Topic Detection & Tracking
Opinion Mining
Storage
Presentation
Fig. 1.1
System overview
6
http://www.neofonie.de/ .
 
Search WWH ::




Custom Search