Database Reference
In-Depth Information
The preceding code will output the squared error:
squaredError: Double = 1.010777282523947E-6
So, in order to compute the overall MSE for the dataset, we need to compute this squared
error for each (user, movie, actual rating, predicted rating) entry,
sum them up, and divide them by the number of ratings. We will do this in the following
code snippet.
Tip
Note the following code is adapted from the Apache Spark programming guide for ALS at
http://spark.apache.org/docs/latest/mllib-collaborative-filtering.html .
First, we will extract the user and product IDs from the ratings RDD and make predic-
tions for each user-item pair using model.predict . We will use the user-item pair as
the key and the predicted rating as the value:
val usersProducts = ratings.map{ case Rating(user, product,
rating) => (user, product)}
val predictions = model.predict(usersProducts).map{
case Rating(user, product, rating) => ((user, product),
rating)
}
Next, we extract the actual ratings and also map the ratings RDD so that the user-item
pair is the key and the actual rating is the value. Now that we have two RDDs with the
same form of key, we can join them together to create a new RDD with the actual and pre-
dicted ratings for each user-item combination:
val ratingsAndPredictions = ratings.map{
case Rating(user, product, rating) => ((user, product),
rating)
}.join(predictions)
Finally, we will compute the MSE by summing up the squared errors using reduce and
dividing by the count method of the number of records:
val MSE = ratingsAndPredictions.map{
case ((user, product), (actual, predicted)) =>
Search WWH ::




Custom Search