Visualizing Trees and Forests - Data Visualization

Graphics Reference

In-Depth Information

Fitting Tree Models

10.2.3

So far we have discussed methods for visualizing tree models on their own and in-

cluding data the models are applied to. here is, however, more information associ-

ated with each node that waits to be visualized. In order to understand tree models

better, we need to know more about the process of fitting tree models.

Although a tree model is straightforward to interpret and apply, its construction

isnottrivial.Intheory,wewouldliketoconsiderallpossibletreemodelsandpick

the one that fits the given data best, based on some loss function. Unfortunately this

proves to be unfeasible save for trivial examples because the computational cost in-

creases exponentially with tree size.

herefore several other approaches have been suggested for fitting tree models.

hemostcommonlyusedalgorithmCART(ClassificationandRegressionTrees)was

introduced by Breiman et al. ( ). It performs a greedy local optimization as fol-

lows:foragiven node,consider all possiblesplitsandchoosetheonethat reduces the

relative impurity of the childnodes most relative to the parent node.his decrease of

impurity (and hence increase of purity) is assessed using an impurity criterion. Such

a locally optimal split is then used and the search is performed recursively in each

child node.

he growth is stoppedif one of the stoppingrules is met. hemost common stop-

pingrulesareaminimalnumberofcasesinanodeandaminimalrequirementonthe

impurity decrease. In practice it is common to relax the stopping rule and use prun-

ing methods; however, discussion of pruning methods is beyond the scope of this

chapter. Nevertheless, visualization can be useful for pruning, especially in an inter-

active context where pruning parameters can be changed on the fly and reflected in

various displays.

Measures of impurity can be any arbitrary convex functions, but the commonly

used measures are entropy and Gini index, which have theoretical foundations (cf.

Ripley, ). It is important to note that this search looks for a local optimum only.

It has no way to “look ahead” and consider multiple splits at once. Nevertheless, it

is computationally inexpensive compared to a full search and performs considerably

well in practice.

he consequence of committing to a local optimum at each split point is a poten-

tial instability of the model. Small changes in the training data can cause a different

split to be chosen. Although the alternate split may lead to a very similar decrease of

impurity, the resulting partition can be entirely different. his will have a big impact

on any following splits and thus producean entirely different tree model.Wewant to

present a visualization technique that allows us to learn more about decisions made

at the node level during tree model fitting.

Mountain Plots

he basic idea of a mountain plot (Urbanek, ) is to visualize the decrease of im-

purity over the entire range of the split variable. his is illustrated on a binary clas-

sification problem in Fig. . . In this particular example a binary response denotes

Data Visualization

Search WWH ::

Custom Search

Home