Semantic Movie Recommendations - Smart Information Systems: Computational Intelligence for Real-Life Applications

Information Technology Reference

In-Depth Information

USV T

U k S k V k

=

A

The adjacency matrix A is decomposed into a diagonal matrix S , containing the

singular values of A in descending order. The matrices U and V consist of the left-

singular and right-singular vectors for S . The low rank approximation of A considers

only the largest k singular values of A and the respective singular vectors ( U k , V k ).

The SVD-based approach allows us an efficient reduction of the adjacency

matrix A . It has been shown that the low rank approximation is a good model for large

sparse matrices [ 18 ]. The low-rank approximation of the adjacency matrix allows us

to consider long paths in the graph (by computing the powers of the matrix A ).

The disadvantages of the SVD-based low-rank approximation is that the approach

is resource-demanding and highly depends on the applied scaling approach for the

matrix A . In general, dataset updates require a re-calculation of the model. More-

over, the SVD-based approaches use nonreversible projections making it difficult to

provide human-understandable explanations.

Cluster-based Models: Clustering is an alternative approach for reducing the

dataset complexity. It is based on the assumption that similar entities should be

aggregated in order to reduce the number of distinct entities. The clusters focus

on the characteristic properties the objects (aggregated in a cluster) have in com-

mon and abstract from noise. Clustering is a very flexible approach since the sim-

ilarity measures can be chosen in a wide variety of distance functions. Dependent

from the respective dataset different clustering algorithms (e.g., K-Means-Clustering,

Hierarchical clustering [ 30 ]) can be applied. The concept of clustering is well under-

stood by many users. This enables the generation of human readable explanations

based on clusters.

In summary, clustering is a flexible, well-accepted approach for reducing the com-

plexity of a dataset. Depended on the clustering algorithms and the similarity mea-

sures the degree of aggregating the entities can be controlled. In general, the definition

of adequate clustering strategies and similarity measures requires expert knowledge

in order to match the specific characteristics of the recommendation scenario.

Models for Text-based Recommenders: Semantic recommender approaches

focus on entities explicitly connected by labeled edges. In many real-world scenarios

comprehensive textual meta-data for entities exist. For example, in the movie domain

plot descriptions and reviews are available. The textual descriptions can be used

as an additional knowledge source when analyzing the semantic relation between

entities. The similarity between two texts can be computed based on the number of

common words or by counting the number of common entities (using Named Entity

Recognition and Named Entity Disambiguation algorithms [ 20 ]).

Since textual descriptions do not only contain keywords, but also grammatical

structures (such as articles and conjunctions) having only a very small impact on the

content, texts should be preprocessed before computing the relatedness between two

texts. Techniques used for preprocessing natural language texts are stopword removal

and stemming that efficiently reduce the vector space spanned by the words of a set

of given texts. In addition, these techniques improve the quality of the similarity

computation due to the fact, that the words having no semantic meaning are ignored.

Smart Information Systems: Computational Intelligence for Real-Life Applications

Search WWH ::

Custom Search

Home