Database Reference
In-Depth Information
[(16.0, 119.30920003093594), (40.0, 116.95463511937378),
(32.0, 116.57294610647752)]
Log-transformed predictions:
[(15.999999999999998, 45.860944832110015), (40.0,
43.255903592233274), (32.0, 42.311306147884252)]
If we compare these results to the results on the raw target variable, we see that while we
did not improve the MSE or MAE, we improved the RMSLE.
We will perform the same analysis for the decision tree model:
data_dt_log = data_dt.map(lambda lp:
LabeledPoint(np.log(lp.label), lp.features))
dt_model_log = DecisionTree.trainRegressor(data_dt_log,{})
preds_log = dt_model_log.predict(data_dt_log.map(lambda p:
p.features))
actual_log = data_dt_log.map(lambda p: p.label)
true_vs_predicted_dt_log =
actual_log.zip(preds_log).map(lambda (t, p): (np.exp(t),
np.exp(p)))
mse_log_dt = true_vs_predicted_dt_log.map(lambda (t, p):
squared_error(t, p)).mean()
mae_log_dt = true_vs_predicted_dt_log.map(lambda (t, p):
abs_error(t, p)).mean()
rmsle_log_dt = np.sqrt(true_vs_predicted_dt_log.map(lambda
(t, p): squared_log_error(t, p)).mean())
print "Mean Squared Error: %2.4f" % mse_log_dt
print "Mean Absolue Error: %2.4f" % mae_log_dt
print "Root Mean Squared Log Error: %2.4f" % rmsle_log_dt
print "Non log-transformed predictions:\n" +
str(true_vs_predicted_dt.take(3))
print "Log-transformed predictions:\n" +
str(true_vs_predicted_dt_log.take(3))
From the results here, we can see that we actually made our metrics slightly worse for the
decision tree:
Search WWH ::




Custom Search