Personalized Information Access Using Semantic Knowledge - Smart Information Systems: Computational Intelligence for Real-Life Applications

Information Technology Reference

In-Depth Information

7.2 The SERUM Architecture

The SERUM architecture consists of four building blocks:

•

the news crawler,

•

the named-entity recognition and disambiguation component (NER/NED),

•

the user modeling component, and

•

the semantic recommender.

The news crawler component, provided by Neofonie GmbH, 3 collects around

60,000 news articles from German and English news sites everyday. The NER/NED

component [ 25 ] identifies and extracts named entities from these news texts and links

them to a dataset collected from Freebase. 4

Freebase is a semantic encyclopedic

data collection, comparable to DBpedia. 5

The dataset consists of

≈

400,000 artists,

≈

1,9 million edges.

These data are interlinked with the news corpus through the entities detected in news

articles using the NER/NED component. The NER/NED associates a Freebase entry

to every entity found in an article by linking a Freebase URL to the entity. The news

corpus currently contains over 7,200,000 news articles, growing daily by the newly

crawled articles, and builds together with the Freebase data the knowledge base

for the recommender. The recommendation algorithm itself is explained in detail in

Sect. 7.4 and in [ 22 ].

The user modeling component implicitly collects the users' reading behavior to

build a user model containing the users' interest in topics or entities. Figure 7.1

shows the user interface of SERUM with the personalized news stream. Under each

news article, all entities are displayed, which are detected in the article. Each user

interaction with an article or an entity is tracked and incorporated in the user model.

In the current system, we focus on four user interactions that can be tracked (Fig. 7.2 ):

•

1,700,000 tracks and albums, and

≈

2,000 genres, connected by

≈

User clicks on an article: The news and all related entities are marked as interesting.

•

User clicks on an article in a list: The clicked article and all related entities are

marked as interesting for the user, while all other surrounding articles are marked

as less interesting.

•

User clicks on recognized entities in an article and

•

Triggered mouse-over events: Entities clicked by the user or marked by the mouse

pointer are given a higher interest rating.

This user feedback is collected using the semantic user behavior tracker described

in [ 26 ], which is part of the web application. The data are stored on the server-side

in an RDF store using the User Behavior Ontology (UBO), described in Fig. 7.7 .We

build on the idea presented in [ 35 ] to use a distinct behavior model but use a more

comprehensive model to not only track events but also to track semantic relations

between entities on a webpage as presented in [ 31 ]. The UBO describes all events

3

http://www.neofonie.de/ .

4

http://freebase.com/ .

5

http://dbpedia.org/ .

Smart Information Systems: Computational Intelligence for Real-Life Applications

Search WWH ::

Custom Search

Home