Information Technology Reference
In-Depth Information
USV T
U k S k V k
=
A
The adjacency matrix A is decomposed into a diagonal matrix S , containing the
singular values of A in descending order. The matrices U and V consist of the left-
singular and right-singular vectors for S . The low rank approximation of A considers
only the largest k singular values of A and the respective singular vectors ( U k , V k ).
The SVD-based approach allows us an efficient reduction of the adjacency
matrix A . It has been shown that the low rank approximation is a good model for large
sparse matrices [ 18 ]. The low-rank approximation of the adjacency matrix allows us
to consider long paths in the graph (by computing the powers of the matrix A ).
The disadvantages of the SVD-based low-rank approximation is that the approach
is resource-demanding and highly depends on the applied scaling approach for the
matrix A . In general, dataset updates require a re-calculation of the model. More-
over, the SVD-based approaches use nonreversible projections making it difficult to
provide human-understandable explanations.
Cluster-based Models: Clustering is an alternative approach for reducing the
dataset complexity. It is based on the assumption that similar entities should be
aggregated in order to reduce the number of distinct entities. The clusters focus
on the characteristic properties the objects (aggregated in a cluster) have in com-
mon and abstract from noise. Clustering is a very flexible approach since the sim-
ilarity measures can be chosen in a wide variety of distance functions. Dependent
from the respective dataset different clustering algorithms (e.g., K-Means-Clustering,
Hierarchical clustering [ 30 ]) can be applied. The concept of clustering is well under-
stood by many users. This enables the generation of human readable explanations
based on clusters.
In summary, clustering is a flexible, well-accepted approach for reducing the com-
plexity of a dataset. Depended on the clustering algorithms and the similarity mea-
sures the degree of aggregating the entities can be controlled. In general, the definition
of adequate clustering strategies and similarity measures requires expert knowledge
in order to match the specific characteristics of the recommendation scenario.
Models for Text-based Recommenders: Semantic recommender approaches
focus on entities explicitly connected by labeled edges. In many real-world scenarios
comprehensive textual meta-data for entities exist. For example, in the movie domain
plot descriptions and reviews are available. The textual descriptions can be used
as an additional knowledge source when analyzing the semantic relation between
entities. The similarity between two texts can be computed based on the number of
common words or by counting the number of common entities (using Named Entity
Recognition and Named Entity Disambiguation algorithms [ 20 ]).
Since textual descriptions do not only contain keywords, but also grammatical
structures (such as articles and conjunctions) having only a very small impact on the
content, texts should be preprocessed before computing the relatedness between two
texts. Techniques used for preprocessing natural language texts are stopword removal
and stemming that efficiently reduce the vector space spanned by the words of a set
of given texts. In addition, these techniques improve the quality of the similarity
computation due to the fact, that the words having no semantic meaning are ignored.
Search WWH ::




Custom Search