Graphics Reference
In-Depth Information
Incontrast tosectioned scatterplots for individual trees, wedonothave theconve-
nient ability of a drill-down, unless several models agree on the same subset. here-
fore the aim of the visualization technique described in the next section is to show
all trees and their splits at a glance.
Trace Plot
10.3.3
he aim of a trace plot is to provide a plot that allows comparison of arbitrarily many
trees with respect to splits, cut points, and the hierarchical structure. his is not pos-
sible using any of the visualization methods described so far.
he basis of the trace plot is a rectangular grid consisting of split variables as
columnsandnodedepthsasrows.Eachcellinthisgridrepresentsapossibletree
node. To distinguish actual split points, each cell contains a glyph representing pos-
sible split points. For continuous variables it consists of a horizontal axis, and a split
point is represented by a tick mark. Categorical variables are shown as boxes cor-
responding to possible split combinations. Every two adjacent inner nodes are con-
nected by an edge between their split points.
A classification tree and its trace plot is shown inFig. . .heroot nodefeatures
asplitonthevariablepalmitoleic, which is represented by the rightmost column. Its
child nodes use splits on the variables linoleic and oleic, hence the two edges leading
from the root node to the next row of splits. here are no further inner nodes as
children of the linoleic split; therefore the branch ends there. Analogously, all inner
nodes are drawn in the trace plot until terminal nodes are reached.
Itisevidentthatallsplitsofthetreecanbereconstructedfromitsrepresentationin
the trace plot because every cut point is shown in the trace plot. Equally, it is possible
to reconstruct the hierarchical structure of the tree due to the presence of edges in
the trace plot.
Moreover, the trace plot removes an ambiguity known from hierarchical views:
the order of the child nodes is irrelevant for the model, whereas swapping let and
right children in the hierarchical view produces quite different hierarchical plots. In
a trace plot the order of the child nodes is defined by the grid and therefore fixed for
all trees in the plot.
One important advantage of trace plots is the ability to display multiple tree mod-
els simultaneously, superimposing all models on the same grid. A trace plot of
bootstrapped classification trees is shown in Fig. . . his confirms the ability of
bootstrapping to produce models that deviate from certain local optima.
To prevent overplotting, we use semitransparent edges. Consequently, oten used
paths are more opaque than infrequently used paths. We can clearly see that the first
split always uses the palmitoleic variable. In the next step, however, there are several
alternatives for the splits. Some patterns seem to be repeated further down the tree,
indicating aratherstable subgroupthatcanbereachedinseveraldifferentwaysalong
the tree. In this particular example we can recognize substructures that a rm the
partial stability of the tree models.
Search WWH ::




Custom Search