Database Reference
In-Depth Information
We also need the list of movie IDs for each user to pass into our APK function as the ac-
tual argument. We already have the ratings RDD ready, so we can extract just the
user and movie IDs from it.
If we use Spark's groupBy operator, we will get an RDD that contains a list of
(userid, movieid) pairs for each user ID (as the user ID is the key on which we
perform the groupBy operation):
val userMovies = ratings.map{ case Rating(user, product,
rating) => (user, product) }.groupBy(_._1)
The output of the preceding code is as follows:
userMovies: org.apache.spark.rdd.RDD[(Int, Seq[(Int,
Int)])] = MapPartitionsRDD[277] at groupBy at <console>:21
Finally, we can use Spark's join operator to join these two RDDs together on the user ID
key. Then, for each user, we have the list of actual and predicted movie IDs that we can
pass to our APK function. In a manner similar to how we computed MSE, we will sum
each of these APK scores using a reduce action and divide by the number of users (that
is, the count of the allRecs RDD):
val K = 10
val MAPK = allRecs.join(userMovies).map{ case (userId,
(predicted, actualWithIds)) =>
val actual = actualWithIds.map(_._2).toSeq
avgPrecisionK(actual, predicted, K)
}.reduce(_ + _) / allRecs.count
println("Mean Average Precision at K = " + MAPK)
The preceding code will print the mean average precision at K as follows:
Mean Average Precision at K = 0.030486963254725705
Our model achieves a fairly low MAPK. However, note that typical values for recom-
mendation tasks are usually relatively low, especially if the item set is extremely large.
Try out a few parameter settings for lambda and rank (and alpha if you are using the
implicit version of ALS) and see whether you can find a model that performs better based
on the RMSE and MAPK evaluation metrics.
Search WWH ::




Custom Search