Visualizing Trees and Forests - Data Visualization

Graphics Reference

In-Depth Information

Incontrast tosectioned scatterplots for individual trees, wedonothave theconve-

nient ability of a drill-down, unless several models agree on the same subset. here-

fore the aim of the visualization technique described in the next section is to show

all trees and their splits at a glance.

Trace Plot

10.3.3

he aim of a trace plot is to provide a plot that allows comparison of arbitrarily many

trees with respect to splits, cut points, and the hierarchical structure. his is not pos-

sible using any of the visualization methods described so far.

he basis of the trace plot is a rectangular grid consisting of split variables as

columnsandnodedepthsasrows.Eachcellinthisgridrepresentsapossibletree

node. To distinguish actual split points, each cell contains a glyph representing pos-

sible split points. For continuous variables it consists of a horizontal axis, and a split

point is represented by a tick mark. Categorical variables are shown as boxes cor-

responding to possible split combinations. Every two adjacent inner nodes are con-

nected by an edge between their split points.

A classification tree and its trace plot is shown inFig. . .heroot nodefeatures

asplitonthevariablepalmitoleic, which is represented by the rightmost column. Its

child nodes use splits on the variables linoleic and oleic, hence the two edges leading

from the root node to the next row of splits. here are no further inner nodes as

children of the linoleic split; therefore the branch ends there. Analogously, all inner

nodes are drawn in the trace plot until terminal nodes are reached.

Itisevidentthatallsplitsofthetreecanbereconstructedfromitsrepresentationin

the trace plot because every cut point is shown in the trace plot. Equally, it is possible

to reconstruct the hierarchical structure of the tree due to the presence of edges in

the trace plot.

Moreover, the trace plot removes an ambiguity known from hierarchical views:

the order of the child nodes is irrelevant for the model, whereas swapping let and

right children in the hierarchical view produces quite different hierarchical plots. In

a trace plot the order of the child nodes is defined by the grid and therefore fixed for

all trees in the plot.

One important advantage of trace plots is the ability to display multiple tree mod-

els simultaneously, superimposing all models on the same grid. A trace plot of

bootstrapped classification trees is shown in Fig. . . his confirms the ability of

bootstrapping to produce models that deviate from certain local optima.

To prevent overplotting, we use semitransparent edges. Consequently, oten used

paths are more opaque than infrequently used paths. We can clearly see that the first

split always uses the palmitoleic variable. In the next step, however, there are several

alternatives for the splits. Some patterns seem to be repeated further down the tree,

indicating aratherstable subgroupthatcanbereachedinseveraldifferentwaysalong

the tree. In this particular example we can recognize substructures that a rm the

partial stability of the tree models.

Search WWH ::

Custom Search

Home