Building a Recommendation Engine with Spark - Machine Learning with Spark

Database Reference

In-Depth Information

The preceding code will output the squared error:

squaredError: Double = 1.010777282523947E-6

So, in order to compute the overall MSE for the dataset, we need to compute this squared

error for each (user, movie, actual rating, predicted rating) entry,

sum them up, and divide them by the number of ratings. We will do this in the following

code snippet.

Tip

Note the following code is adapted from the Apache Spark programming guide for ALS at

First, we will extract the user and product IDs from the ratings RDD and make predic-

tions for each user-item pair using model.predict . We will use the user-item pair as

the key and the predicted rating as the value:

val usersProducts = ratings.map{ case Rating(user, product,

rating) => (user, product)}

val predictions = model.predict(usersProducts).map{

case Rating(user, product, rating) => ((user, product),

rating)

}

Next, we extract the actual ratings and also map the ratings RDD so that the user-item

pair is the key and the actual rating is the value. Now that we have two RDDs with the

same form of key, we can join them together to create a new RDD with the actual and pre-

dicted ratings for each user-item combination:

val ratingsAndPredictions = ratings.map{

case Rating(user, product, rating) => ((user, product),

rating)

}.join(predictions)

Finally, we will compute the MSE by summing up the squared errors using reduce and

dividing by the count method of the number of records:

val MSE = ratingsAndPredictions.map{

case ((user, product), (actual, predicted)) =>

Search WWH ::

Custom Search

Home