Information Technology Reference
In-Depth Information
7.2 The SERUM Architecture
The SERUM architecture consists of four building blocks:
the news crawler,
the named-entity recognition and disambiguation component (NER/NED),
the user modeling component, and
the semantic recommender.
The news crawler component, provided by Neofonie GmbH, 3 collects around
60,000 news articles from German and English news sites everyday. The NER/NED
component [ 25 ] identifies and extracts named entities from these news texts and links
them to a dataset collected from Freebase. 4
Freebase is a semantic encyclopedic
data collection, comparable to DBpedia. 5
The dataset consists of
400,000 artists,
1,9 million edges.
These data are interlinked with the news corpus through the entities detected in news
articles using the NER/NED component. The NER/NED associates a Freebase entry
to every entity found in an article by linking a Freebase URL to the entity. The news
corpus currently contains over 7,200,000 news articles, growing daily by the newly
crawled articles, and builds together with the Freebase data the knowledge base
for the recommender. The recommendation algorithm itself is explained in detail in
Sect. 7.4 and in [ 22 ].
The user modeling component implicitly collects the users' reading behavior to
build a user model containing the users' interest in topics or entities. Figure 7.1
shows the user interface of SERUM with the personalized news stream. Under each
news article, all entities are displayed, which are detected in the article. Each user
interaction with an article or an entity is tracked and incorporated in the user model.
In the current system, we focus on four user interactions that can be tracked (Fig. 7.2 ):
1,700,000 tracks and albums, and
2,000 genres, connected by
User clicks on an article: The news and all related entities are marked as interesting.
User clicks on an article in a list: The clicked article and all related entities are
marked as interesting for the user, while all other surrounding articles are marked
as less interesting.
User clicks on recognized entities in an article and
Triggered mouse-over events: Entities clicked by the user or marked by the mouse
pointer are given a higher interest rating.
This user feedback is collected using the semantic user behavior tracker described
in [ 26 ], which is part of the web application. The data are stored on the server-side
in an RDF store using the User Behavior Ontology (UBO), described in Fig. 7.7 .We
build on the idea presented in [ 35 ] to use a distinct behavior model but use a more
comprehensive model to not only track events but also to track semantic relations
between entities on a webpage as presented in [ 31 ]. The UBO describes all events
3
http://www.neofonie.de/ .
4
http://freebase.com/ .
5
http://dbpedia.org/ .
Search WWH ::




Custom Search