Database Reference
In-Depth Information
Computing performance metrics on the bike
sharing dataset
Given the functions we defined earlier, we can now compute the various evaluation metrics
on our bike sharing data.
Linear model
Our approach will be to apply the relevant error function to each record in the RDD we
computed earlier, which is true_vs_predicted for our linear model:
mse = true_vs_predicted.map(lambda (t, p): squared_error(t,
p)).mean()
mae = true_vs_predicted.map(lambda (t, p): abs_error(t,
p)).mean()
rmsle = np.sqrt(true_vs_predicted.map(lambda (t, p):
squared_log_error(t, p)).mean())
print "Linear Model - Mean Squared Error: %2.4f" % mse
print "Linear Model - Mean Absolute Error: %2.4f" % mae
print "Linear Model - Root Mean Squared Log Error: %2.4f" %
rmsle
This outputs the following metrics:
Linear Model - Mean Squared Error: 28166.3824
Linear Model - Mean Absolute Error: 129.4506
Linear Model - Root Mean Squared Log Error: 1.4974
Decision tree
We will use the same approach for the decision tree model, using the
true_vs_predicted_dt RDD:
mse_dt = true_vs_predicted_dt.map(lambda (t, p):
squared_error(t, p)).mean()
mae_dt = true_vs_predicted_dt.map(lambda (t, p):
abs_error(t, p)).mean()
rmsle_dt = np.sqrt(true_vs_predicted_dt.map(lambda (t, p):
squared_log_error(t, p)).mean())
Search WWH ::




Custom Search