Environmental Engineering Reference
In-Depth Information
real-valued targets, is given in Fig. 14.2 (DemĖ‡ar et al. 2006). The tree predicts
three targets simultaneously: the abundance of Acari and Collembola, as well as
their biodiversity in soil.
14.3.2 Learning Decision Trees
Given a set of training examples, we want to find a decision tree that fits the data
well and is as small (and thus as understandable) as possible. Finding the smallest
decision tree that will fit a given data set is known to be computationally expensive.
Heuristic search techniques are thus employed to build decision trees, guided by
measures of impurity or dispersion of the target attribute. Greedy search, consider-
ing only one test/split at a time, is typically used.
The typical way to induce decision trees is the so-called Top-Down Induction of
Decision Trees (TDIDT, Quinlan 1986). Tree construction proceeds recursively start-
ing with the entire set of training examples (entire table). At each step, the algorithm
first checks if the stopping criterion is satisfied (e.g. all examples belong to the same
class): If not, an attribute (test) is selected as the root of the (sub-)tree, the current
training set is split into subsets according to the values of the selected attribute, and the
algorithm is called recursively on each of the subsets. The attribute/test is chosen so
that the resulting subsets have as homogeneous class values as possible.
Consider for example the tree in Fig. 14.1 . At the root node, the algorithm
addresses each of the independent variables (including silt, clay, pH and time
since sowing) and selects one variable (clay)/test (clay
7.8) that splits the entire
set of examples best, i.e. results in subsets with homogeneous values of the class
(as compared to other attributes/tests). The examples are then split into two
subsets (those with clay
>
7.8 go down the right branch, the others to the left),
and the algorithm is started again twice, once for the left and once for the right
subset. In each of the two cases, only the examples in the respective branch are
used to build the respective sub-tree (examples going down the left/right branch
are used to construct the left/right sub-tree).
For discrete attributes, a branch of the tree is typically created for each possible
value of the attribute. For continuous attributes, a threshold is selected and two
branches are created based on that threshold. For the subsets of training examples in
each branch, the tree construction algorithm is called recursively. Tree construction
stops when the examples in a node are sufficiently pure (i.e. all are of the same
class) or if some other stopping criterion is satisfied (e.g. there is no good attribute/
test to add at that point). Such (terminal) nodes are called leaves and are labelled
with the corresponding values of the class.
Different measures can be used to select an attribute in the attribute selection step.
Common to all of them is that they measure the homogeneity (or the opposite,
dispersion) of the values of the target and its increase (decrease) after selecting the
attribute/test for the current node. They differ for classification and regression trees
(Breiman et al. 1984) and a number of choices exists for each case. For classification,
>
Search WWH ::




Custom Search