Database Reference
In-Depth Information
val apk10 = avgPrecisionK(actualMovies, predictedMovies, 10)
The preceding code will print:
apk10: Double = 0.0
In this case, we can see that our model is not doing a very good job of predicting relevant
movies for this user as the APK score is 0.
In order to compute the APK for each user and average them to compute the overall
MAPK, we will need to generate the list of recommendations for each user in our dataset.
While this can be fairly intensive on a large scale, we can distribute the computation using
our Spark functionality. However, one limitation is that each worker must have the full
item-factor matrix available so that it can compute the dot product between the relevant
user vector and all item vectors. This can be a problem when the number of items is ex-
tremely high as the item matrix must fit in the memory of one machine.
Tip
There is actually no easy way around this limitation. One possible approach is to only
compute recommendations for a subset of items from the total item set, using approximate
techniques such as Locality Sensitive Hashing (
http://en.wikipedia.org/wiki/Locality-sens-
We will now see how to go about this. First, we will collect the item factors and form a
DoubleMatrix
object from them:
val itemFactors = model.productFeatures.map { case (id,
factor) => factor }.collect()
val itemMatrix = new DoubleMatrix(itemFactors)
println(itemMatrix.rows, itemMatrix.columns)
The output of the preceding code is as follows:
(1682,50)
This gives us a matrix with
1682
rows and
50
columns, as we would expect from
1682
movies with a factor dimension of
50
. Next, we will distribute the item matrix as a broad-
cast variable so that it is available on each worker node:
val imBroadcast = sc.broadcast(itemMatrix)