Information Technology Reference
In-Depth Information
in small relatedness scores. If two nodes are not directly connected by an edge, we
assign a relatedness score of 0, resulting in a distance of infinity. The basic rules
for calculating the distance between two nodes in a semantic graph are visualized in
Table 5.2 . The characteristics of the different edge algebras are discussed in the next
paragraphs.
Shortest Path : We define the distance of two nodes in the graph based on the
shortest path between two nodes. The shortest path edge algebra assigns the minimal
distance of all paths between two nodes if several parallel paths exist. The distance
values are summed up for a sequence of edges.
The shortest path distance has the advantage that it can be efficiently implemented
(e.g., based on depth-bounded search). Additionally, the shortest path principle is
well-understood by many users and widely accepted. Unfortunately, the shortest
path approach does not take into account the number of parallel paths between two
nodes. Thus, this algebra should not be used if the number of parallel paths is the
dominant criteria for computing the relatedness of two nodes.
The Resistance Distance : In contrast to the shortest path algebra, the resis-
tance distance [ 29 ] considers all parallel paths between two nodes. The resistance
distance can be computed based on the Moore-Penrose pseudo-inverse of the
Laplacian matrix of the graph. The Resistance Distance is more complex than the
shortest path algebra, but it fits well with the proposed criteria. In many application
scenarios the resistance distance is computable with an acceptable effort. Unfortu-
nately, the resistance distance is difficult to understand for most users, making it hard
to generate good explanations.
The Weighted Path Algebra : The weighted path algebra defines an very efficient
approach for computing the similarity of two nodes. The algebra is induced by a
standard dot product of the adjacency matrix of a graph. It assumes that the edge
scores are between 0 and 1 ensuring that the score for a long path is lower than the
score of each edge in the path. The advantage of this algebra is, that the underlying
assumptions are well-understood by most users. Additionally, it can be efficiently
computed based on matrixes. A disadvantage is that the weight of a complex path
can be above the upper bound of 1.
5.3.4.5 Complexity and Noise
Real-world datasets are often huge, sparse and noisy. Since Linked-Open-Data is
often generated by volunteers in their spare time and not by professional experts,
Linked-Open datasets might contain inconsistencies and errors [ 7 ]. In user-generated
entertainment datasets, the amount of information might differ from domain to
domain. User-generated descriptions might contain spelling mistakes or invalid char-
acters. Thus a recommender component should be aware of these challenges and
provide robust algorithms able to cope with noisy, heterogeneous data.
The dataset complexity as well as the differences in sparsity and in the noise
between the aggregated datasets must be taken into account by the recommender
framework. The recommender system should provide strategies for reducing the
Search WWH ::




Custom Search