Information Technology Reference
In-Depth Information
Table 5.1 The table shows the size of the entity and relationship sets of our movie dataset
Entity type
Number
Relationship
Number of edges
Graph density
of entities
movies
Actors
95,321
Actors
231,742
1.80E-05
Directors
4,060
Directors
10,155
4.24E-06
Genres
20
Genres
20,809
9.80E-06
Tags
5,297
Tags
51,795
2.09E-05
Locations
187
Locations
49,167
2.30E-05
Countries
72
Countries
10,197
4.80E-06
Users
760
Users (train)
525,318
2.42E-04
Movies
65,133
Users (test)
330,280
1.52E-04
Discussion : In this section we explained our semantic datasets for the movie
recommendation domain. We presented the dataset we use for learning a semantic
movie recommender and discussed the properties of the dataset. The dataset com-
prises encyclopedic knowledge as well as rating knowledge describing the user's
preferences. Data collected from different sources are combined in one large graph
and stored in a uniform semantic data format. All nodes ( entities ) in the created
graph are identified by unique uniform resource identifiers (URIs). The dataset sta-
tistic (Table 5.1 ) shows, that the properties of the entity sets and the relationships
sets highly differ (according to number of elements and according to the density of
the relationships sets). Thus, the dataset allows us to analyze how our approach can
handle the heterogeneity in large semantic datasets.
5.3.4 Challenges and Requirements for Learning
Recommenders
Having created one large semantic knowledge graph that aggregates data from het-
erogeneous sources, we define an approach for learning a recommendation strategy.
The challenges of creating a powerful semantic recommender are: (1) The hetero-
geneity of the aggregated sub-graphs according to the number of nodes and edges,
(2) the sparsity of sub-graphs, and (3) the diversity of noise in the aggregated sub-
datasets. In addition, the sub-graphs may use different edge types and schemes for
assigning edge labels requiring a domain-specific model reflecting the heterogeneity
of the aggregated graphs. In the following paragraphs we discuss the challenges in
detail and explain approaches for the processing of heterogeneous semantic data.
 
Search WWH ::




Custom Search