Building a Recommendation Engine with Spark - Machine Learning with Spark

Database Reference

In-Depth Information

val apk10 = avgPrecisionK(actualMovies, predictedMovies, 10)

The preceding code will print:

apk10: Double = 0.0

In this case, we can see that our model is not doing a very good job of predicting relevant

movies for this user as the APK score is 0.

In order to compute the APK for each user and average them to compute the overall

MAPK, we will need to generate the list of recommendations for each user in our dataset.

While this can be fairly intensive on a large scale, we can distribute the computation using

our Spark functionality. However, one limitation is that each worker must have the full

item-factor matrix available so that it can compute the dot product between the relevant

user vector and all item vectors. This can be a problem when the number of items is ex-

tremely high as the item matrix must fit in the memory of one machine.

Tip

There is actually no easy way around this limitation. One possible approach is to only

compute recommendations for a subset of items from the total item set, using approximate

techniques such as Locality Sensitive Hashing ( http://en.wikipedia.org/wiki/Locality-sens-

itive_hashing ) .

We will now see how to go about this. First, we will collect the item factors and form a

DoubleMatrix object from them:

val itemFactors = model.productFeatures.map { case (id,

factor) => factor }.collect()

val itemMatrix = new DoubleMatrix(itemFactors)

println(itemMatrix.rows, itemMatrix.columns)

The output of the preceding code is as follows:

(1682,50)

This gives us a matrix with 1682 rows and 50 columns, as we would expect from 1682

movies with a factor dimension of 50 . Next, we will distribute the item matrix as a broad-

cast variable so that it is available on each worker node:

val imBroadcast = sc.broadcast(itemMatrix)

Search WWH ::

Custom Search

Home