Graphics Reference
In-Depth Information
Fitting Tree Models
10.2.3
So far we have discussed methods for visualizing tree models on their own and in-
cluding data the models are applied to. here is, however, more information associ-
ated with each node that waits to be visualized. In order to understand tree models
better, we need to know more about the process of fitting tree models.
Although a tree model is straightforward to interpret and apply, its construction
isnottrivial.Intheory,wewouldliketoconsiderallpossibletreemodelsandpick
the one that fits the given data best, based on some loss function. Unfortunately this
proves to be unfeasible save for trivial examples because the computational cost in-
creases exponentially with tree size.
herefore several other approaches have been suggested for fitting tree models.
hemostcommonlyusedalgorithmCART(ClassificationandRegressionTrees)was
introduced by Breiman et al. ( ). It performs a greedy local optimization as fol-
lows:foragiven node,consider all possiblesplitsandchoosetheonethat reduces the
relative impurity of the childnodes most relative to the parent node.his decrease of
impurity (and hence increase of purity) is assessed using an impurity criterion. Such
a locally optimal split is then used and the search is performed recursively in each
child node.
he growth is stoppedif one of the stoppingrules is met. hemost common stop-
pingrulesareaminimalnumberofcasesinanodeandaminimalrequirementonthe
impurity decrease. In practice it is common to relax the stopping rule and use prun-
ing methods; however, discussion of pruning methods is beyond the scope of this
chapter. Nevertheless, visualization can be useful for pruning, especially in an inter-
active context where pruning parameters can be changed on the fly and reflected in
various displays.
Measures of impurity can be any arbitrary convex functions, but the commonly
used measures are entropy and Gini index, which have theoretical foundations (cf.
Ripley, ). It is important to note that this search looks for a local optimum only.
It has no way to “look ahead” and consider multiple splits at once. Nevertheless, it
is computationally inexpensive compared to a full search and performs considerably
well in practice.
he consequence of committing to a local optimum at each split point is a poten-
tial instability of the model. Small changes in the training data can cause a different
split to be chosen. Although the alternate split may lead to a very similar decrease of
impurity, the resulting partition can be entirely different. his will have a big impact
on any following splits and thus producean entirely different tree model.Wewant to
present a visualization technique that allows us to learn more about decisions made
at the node level during tree model fitting.
Mountain Plots
he basic idea of a mountain plot (Urbanek, ) is to visualize the decrease of im-
purity over the entire range of the split variable. his is illustrated on a binary clas-
sification problem in Fig. . . In this particular example a binary response denotes
Search WWH ::




Custom Search