Environmental Engineering Reference
In-Depth Information
14.3 Decision Tree Induction
14.3.1 Types of Decision Trees
Decision trees (Breiman et al. 1984) are hierarchical structures, where each internal
node contains a test on an attribute, each branch corresponds to an outcome of the
test, and each leaf (terminal) node gives a prediction for the value of the class
variable. Depending on whether we are dealing with a classification or a regression
problem, the decision tree is called a classification or a regression tree, respectively.
Classification trees predict the values of a discrete variable with a final set of
nominal values. An example classification tree modelling the habitat of oilseed rape
by plant abundance is given in Fig. 14.5 . The tree has been derived from real-world
data by using decision tree induction (Debeljak et al. 2008).
Regression tree leaves contain constant values as predictions for the class value.
They thus represent piece-wise constant functions. Model trees, a type of regression
tree where leaf nodes can contain linear models predicting the class value, represent
piece-wise linear functions. An example model tree that predicts the abundance of
anecic earthworms is given in Fig. 14.1 (Debeljak et al. 2007).
Multi-target trees (Blockeel et al. 1998), sometimes also called multi-objective
trees (Struyf and D ˇ eroski 2006) generalize decision trees to the prediction of several
target attributes simultaneously. The leaves of a multi-target tree store a vector of class
values, one for each target, instead of storing a single class value for one target.
Each component of this vector is a prediction for one of the target attributes.
Depending on whether the targets are all discrete-valued or real-valued, we can
talk about multi-target classification trees or multi-objective regression trees.
An example of a multi-objective regression tree, giving predictions for three
Fig. 14.1 Regression tree for predicting the abundance of anecic earthworms. The additional
information given in each node is the min/mean/max of earthworm biomass. In the leaves, this
information is extended with the number of examples and relative root mean square error
(Debeljak et al. 2007); upper right : epigeic earthworm Eisenia fetida (Lumbricidae) (Courtesy
of Paul Henning Krogh)
Search WWH ::




Custom Search