Building a Recommendation Engine with Spark - Machine Learning with Spark

Database Reference

In-Depth Information

Using MLlib's built-in evaluation functions

While we have computed MSE, RMSE, and MAPK from scratch, and it a useful learning

exercise to do so, MLlib provides convenience functions to do this for us in the Regres-

sionMetrics and RankingMetrics classes.

RMSE and MSE

First, we will compute the MSE and RMSE metrics using RegressionMetrics . We

will instantiate a RegressionMetrics instance by passing in an RDD of key-value

pairs that represent the predicted and true values for each data point, as shown in the fol-

lowing code snippet. Here, we will again use the ratingsAndPredictions RDD we

computed in our earlier example:

import org.apache.spark.mllib.evaluation.RegressionMetrics

val predictedAndTrue = ratingsAndPredictions.map { case

((user, product), (predicted, actual)) => (predicted,

actual) }

val regressionMetrics = new

RegressionMetrics(predictedAndTrue)

We can then access various metrics, including MSE and RMSE. We will print out these

metrics here:

println("Mean Squared Error = " +

regressionMetrics.meanSquaredError)

println("Root Mean Squared Error = " +

regressionMetrics.rootMeanSquaredError)

You will see that the output for MSE and RMSE is exactly the same as the metrics we com-

puted earlier:

Mean Squared Error = 0.08231947642632852

Root Mean Squared Error = 0.2869137090247319

MAP

As we did for MSE and RMSE, we can compute ranking-based evaluation metrics using

MLlib's RankingMetrics class. Similarly, to our own average precision function, we

Search WWH ::

Custom Search

Home