Information Technology Reference
In-Depth Information
Movies
Users
Tags
Actors
Directors
Genres
Locations
Countries
Fig. 5.2 Our semantic movie dataset consists of six bipartite relationship set providing knowledge
about movies. The Movie - User relationship set describes the user preferences
the integration and the processing of these data. Unfortunately, the freely available
IMDb data lack personalized rating and usage information.
We obtain personalized movie preferences from the MovieLens dataset [ 13 ].
MovieLens is a recommender system and virtual community website that allows
users to create profiles and subsequently obtain movie recommendations. The
MovieLens dataset provides rating data including timestamps. Since the Movie-
Lens and the IMDb dataset have a large overlap in the set of considered movies, the
two datasets can be combined aggregating encyclopedic and rating-based knowledge.
The mapping is performed by computing concordant properties (e.g., title, elapsed
time, genres). A frequently used dataset combining data fromMovieLens and IMDb
is theHetRec dataset. The dataset has been created for the InternationalWorkshop on
Information Heterogeneity and Fusion in Recommender Systems (HetRec 2011) 4
and can be retrieved from GroupLens. 5
We use the aggregation of the IMDb and the MovieLens dataset for creating
a semantic movie recommender system. The structure of the dataset is shown in
Fig. 5.2 . The central entity type is Movie . The entity movie is directly connected
with the entitiy types Actors , Directors , Genres , Tags , Locations , and
Countries . In general, the dataset can be seen as a multi-graph, supporting several
different edges between two nodes of the graph.
In addition to the content-based movie descriptions, the relationship Movies-
Users provides user ratings for movies. The user ratings are used for optimizing
and benchmarking the learned recommender strategies. For the evaluation of our
approach, we split the user profiles (obtained from MovieLens) based on a global
timestamp into a training set and a test set. We filter out user profiles having less
than ten entries in the training set or the test set. We handle the dataset as a collection
of bipartite relationship sets each consisting of undirected, equally weighted edges.
The size of the entity sets and edge sets used in the evaluation is shown in Table 5.1 .
4 http://ir.ii.uam.es/hetrec2011/ .
5 http://www.grouplens.org/node/462/ .
Search WWH ::




Custom Search