Building a Regression Model with Spark - Machine Learning with Spark

Database Reference

In-Depth Information

Mean Squared Error and Root Mean Squared

Error

MSE is the average of the squared error that is used as the loss function for least squares re-

gression:

It is the sum, over all the data points, of the square of the difference between the predicted

and actual target variables, divided by the number of data points.

RMSE is the square root of MSE. MSE is measured in units that are the square of the target

variable, while RMSE is measured in the same units as the target variable. Due to its for-

mulation, MSE, just like the squared loss function that it derives from, effectively penalizes

larger errors more severely.

In order to evaluate our predictions based on the mean of an error metric, we will first

make predictions for each input feature vector in an RDD of LabeledPoint instances by

computing the error for each record using a function that takes the prediction and true tar-

get value as inputs. This will return a [Double] RDD that contains the error values. We

can then find the average using the mean method of RDDs that contain Double values.

Let's define our squared error function as follows:

def squared_error(actual, pred):

return (pred - actual)**2

Search WWH ::

Custom Search

Home