Building a Recommendation Engine with Spark - Machine Learning with Spark

Database Reference

In-Depth Information

We also need the list of movie IDs for each user to pass into our APK function as the ac-

tual argument. We already have the ratings RDD ready, so we can extract just the

user and movie IDs from it.

If we use Spark's groupBy operator, we will get an RDD that contains a list of

(userid, movieid) pairs for each user ID (as the user ID is the key on which we

perform the groupBy operation):

val userMovies = ratings.map{ case Rating(user, product,

rating) => (user, product) }.groupBy(_._1)

The output of the preceding code is as follows:

userMovies: org.apache.spark.rdd.RDD[(Int, Seq[(Int,

Int)])] = MapPartitionsRDD[277] at groupBy at <console>:21

Finally, we can use Spark's join operator to join these two RDDs together on the user ID

key. Then, for each user, we have the list of actual and predicted movie IDs that we can

pass to our APK function. In a manner similar to how we computed MSE, we will sum

each of these APK scores using a reduce action and divide by the number of users (that

is, the count of the allRecs RDD):

val K = 10

val MAPK = allRecs.join(userMovies).map{ case (userId,

(predicted, actualWithIds)) =>

val actual = actualWithIds.map(_._2).toSeq

avgPrecisionK(actual, predicted, K)

}.reduce(_ + _) / allRecs.count

println("Mean Average Precision at K = " + MAPK)

The preceding code will print the mean average precision at K as follows:

Mean Average Precision at K = 0.030486963254725705

Our model achieves a fairly low MAPK. However, note that typical values for recom-

mendation tasks are usually relatively low, especially if the item set is extremely large.

Try out a few parameter settings for lambda and rank (and alpha if you are using the

implicit version of ALS) and see whether you can find a model that performs better based

on the RMSE and MAPK evaluation metrics.

Search WWH ::

Custom Search

Home