Database Reference
In-Depth Information
Item recommendations
Item recommendations are about answering the following question: for a certain item, what
are the items most similar to it? Here, the precise definition of similarity is dependent on
the model involved. In most cases, similarity is computed by comparing the vector repres-
entation of two items using some similarity measure. Common similarity measures include
Pearson correlation and cosine similarity for real-valued vectors and Jaccard similarity for
binary vectors.
Generating similar movies for the MovieLens 100k dataset
The current MatrixFactorizationModel API does not directly support item-to-item
similarity computations. Therefore, we will need to create our own code to do this.
We will use the cosine similarity metric, and we will use the jblas linear algebra library (a
dependency of MLlib) to compute the required vector dot products. This is similar to how
the existing predict and recommendProducts methods work, except that we will
use cosine similarity as opposed to just the dot product.
We would like to compare the factor vector of our chosen item with each of the other items,
using our similarity metric. In order to perform linear algebra computations, we will first
need to create a vector object out of the factor vectors, which are in the form of an Ar-
ray[Double] . The JBLAS class, DoubleMatrix , takes an Array[Double] as the
constructor argument as follows:
import org.jblas.DoubleMatrix
val aMatrix = new DoubleMatrix(Array(1.0, 2.0, 3.0))
Here is the output of the preceding code:
aMatrix: org.jblas.DoubleMatrix = [1.000000; 2.000000;
3.000000]
Tip
Note that using jblas, vectors are represented as a one-dimensional DoubleMatrix class,
while matrices are a two-dimensional DoubleMatrix class.
We will need a method to compute the cosine similarity between two vectors. Cosine simil-
arity is a measure of the angle between two vectors in an n -dimensional space. It is com-
Search WWH ::




Custom Search