Database Reference
In-Depth Information
val apk10 = avgPrecisionK(actualMovies, predictedMovies, 10)
The preceding code will print:
apk10: Double = 0.0
In this case, we can see that our model is not doing a very good job of predicting relevant
movies for this user as the APK score is 0.
In order to compute the APK for each user and average them to compute the overall
MAPK, we will need to generate the list of recommendations for each user in our dataset.
While this can be fairly intensive on a large scale, we can distribute the computation using
our Spark functionality. However, one limitation is that each worker must have the full
item-factor matrix available so that it can compute the dot product between the relevant
user vector and all item vectors. This can be a problem when the number of items is ex-
tremely high as the item matrix must fit in the memory of one machine.
Tip
There is actually no easy way around this limitation. One possible approach is to only
compute recommendations for a subset of items from the total item set, using approximate
techniques such as Locality Sensitive Hashing ( http://en.wikipedia.org/wiki/Locality-sens-
itive_hashing ) .
We will now see how to go about this. First, we will collect the item factors and form a
DoubleMatrix object from them:
val itemFactors = model.productFeatures.map { case (id,
factor) => factor }.collect()
val itemMatrix = new DoubleMatrix(itemFactors)
println(itemMatrix.rows, itemMatrix.columns)
The output of the preceding code is as follows:
(1682,50)
This gives us a matrix with 1682 rows and 50 columns, as we would expect from 1682
movies with a factor dimension of 50 . Next, we will distribute the item matrix as a broad-
cast variable so that it is available on each worker node:
val imBroadcast = sc.broadcast(itemMatrix)
Search WWH ::




Custom Search