Building a Recommendation Engine with Spark - Machine Learning with Spark

Database Reference

In-Depth Information

User recommendations

In this case, we would like to generate recommended items for a given user. This usually

takes the form of a top-K list, that is, the K items that our model predicts will have the

highest probability of the user liking them. This is done by computing the predicted score

for each item and ranking the list based on this score.

The exact method to perform this computation depends on the model involved. For ex-

ample, in user-based approaches, the ratings of similar users on items are used to compute

the recommendations for a user, while in an item-based approach, the computation is based

on the similarity of items the user has rated to the candidate items.

In matrix factorization, because we are modeling the ratings matrix directly, the predicted

score can be computed as the vector dot product between a user-factor vector and an item-

factor vector.

Generating movie recommendations from the MovieLens 100k dataset

As MLlib's recommendation model is based on matrix factorization, we can use the factor

matrices computed by our model to compute predicted scores (or ratings) for a user. We

will focus on the explicit rating case using MovieLens data; however, the approach is the

same when using the implicit model.

The MatrixFactorizationModel class has a convenient predict method that will

compute a predicted score for a given user and item combination:

val predictedRating = model.predict(789, 123)

The output is as follows:

14/03/30 16:10:10 INFO SparkContext: Starting job: lookup at

MatrixFactorizationModel.scala:45

14/03/30 16:10:10 INFO DAGScheduler: Got job 30 (lookup at

MatrixFactorizationModel.scala:45) with 1 output partitions

(allowLocal=false)

...

14/03/30 16:10:10 INFO SparkContext: Job finished: lookup at

MatrixFactorizationModel.scala:46, took 0.023077 s

predictedRating: Double = 3.128545693368485

Search WWH ::

Custom Search

Home