Database Reference
In-Depth Information
Decision trees for regression
Just like using linear models for regression tasks involves changing the loss function used,
using decision trees for regression involves changing the measure of the node impurity
used. The impurity metric is called
variance
and is defined in the same way as the squared
loss for least squares linear regression.
Note
See the
MLlib - Decision Tree
section in the Spark documentation at
ht-
tp://spark.apache.org/docs/latest/mllib-decision-tree.html
for further details on the decision
tree algorithm and impurity measure for regression.
Now, we will plot a simple example of a regression problem with only one input variable
shown on the
x
axis and the target variable on the
y
axis. The linear model prediction func-
tion is shown by a red dashed line, while the decision tree prediction function is shown by a
green dashed line. We can see that the decision tree allows a more complex, nonlinear mod-
el to be fitted to the data.