Semantic Movie Recommendations - Smart Information Systems: Computational Intelligence for Real-Life Applications

Information Technology Reference

In-Depth Information

in small relatedness scores. If two nodes are not directly connected by an edge, we

assign a relatedness score of 0, resulting in a distance of infinity. The basic rules

for calculating the distance between two nodes in a semantic graph are visualized in

Table 5.2 . The characteristics of the different edge algebras are discussed in the next

paragraphs.

Shortest Path : We define the distance of two nodes in the graph based on the

shortest path between two nodes. The shortest path edge algebra assigns the minimal

distance of all paths between two nodes if several parallel paths exist. The distance

values are summed up for a sequence of edges.

The shortest path distance has the advantage that it can be efficiently implemented

(e.g., based on depth-bounded search). Additionally, the shortest path principle is

well-understood by many users and widely accepted. Unfortunately, the shortest

path approach does not take into account the number of parallel paths between two

nodes. Thus, this algebra should not be used if the number of parallel paths is the

dominant criteria for computing the relatedness of two nodes.

The Resistance Distance : In contrast to the shortest path algebra, the resis-

tance distance [ 29 ] considers all parallel paths between two nodes. The resistance

distance can be computed based on the Moore-Penrose pseudo-inverse of the

Laplacian matrix of the graph. The Resistance Distance is more complex than the

shortest path algebra, but it fits well with the proposed criteria. In many application

scenarios the resistance distance is computable with an acceptable effort. Unfortu-

nately, the resistance distance is difficult to understand for most users, making it hard

to generate good explanations.

The Weighted Path Algebra : The weighted path algebra defines an very efficient

approach for computing the similarity of two nodes. The algebra is induced by a

standard dot product of the adjacency matrix of a graph. It assumes that the edge

scores are between 0 and 1 ensuring that the score for a long path is lower than the

score of each edge in the path. The advantage of this algebra is, that the underlying

assumptions are well-understood by most users. Additionally, it can be efficiently

computed based on matrixes. A disadvantage is that the weight of a complex path

can be above the upper bound of 1.

5.3.4.5 Complexity and Noise

Real-world datasets are often huge, sparse and noisy. Since Linked-Open-Data is

often generated by volunteers in their spare time and not by professional experts,

Linked-Open datasets might contain inconsistencies and errors [ 7 ]. In user-generated

entertainment datasets, the amount of information might differ from domain to

domain. User-generated descriptions might contain spelling mistakes or invalid char-

acters. Thus a recommender component should be aware of these challenges and

provide robust algorithms able to cope with noisy, heterogeneous data.

The dataset complexity as well as the differences in sparsity and in the noise

between the aggregated datasets must be taken into account by the recommender

framework. The recommender system should provide strategies for reducing the

Search WWH ::

Custom Search

Home