Database Reference
In-Depth Information
The preceding code will output the squared error:
squaredError: Double = 1.010777282523947E-6
So, in order to compute the overall MSE for the dataset, we need to compute this squared
error for each
(user, movie, actual rating, predicted rating)
entry,
sum them up, and divide them by the number of ratings. We will do this in the following
code snippet.
Tip
Note the following code is adapted from the Apache Spark programming guide for ALS at
First, we will extract the user and product IDs from the
ratings
RDD and make predic-
tions for each user-item pair using
model.predict
. We will use the user-item pair as
the key and the predicted rating as the value:
val usersProducts = ratings.map{ case Rating(user, product,
rating) => (user, product)}
val predictions = model.predict(usersProducts).map{
case Rating(user, product, rating) => ((user, product),
rating)
}
Next, we extract the actual ratings and also map the
ratings
RDD so that the user-item
pair is the key and the actual rating is the value. Now that we have two RDDs with the
same form of key, we can join them together to create a new RDD with the actual and pre-
dicted ratings for each user-item combination:
val ratingsAndPredictions = ratings.map{
case Rating(user, product, rating) => ((user, product),
rating)
}.join(predictions)
Finally, we will compute the MSE by summing up the squared errors using
reduce
and
dividing by the
count
method of the number of records:
val MSE = ratingsAndPredictions.map{
case ((user, product), (actual, predicted)) =>