Graphics Reference
In-Depth Information
Other methods based on recursive partitioning of the plot space are treemaps and
spineplots of leaves. Both allow a concise view of all terminal nodes while retaining
hints of the splitting sequence. In conjunction with highlighting and brushing, the
mainfocushereisonthemodelbehaviorwithrespecttodatapoints.Assuchtheplots
can be created using training and test data separately and compared. Treemaps are
more suitable for absolute comparisons and large, complex trees, whereas spineplots
of leaves can be used for relative comparison of groups within terminal nodes up to
moderately complex trees.
Tree models are possibly unstable, that is, small changes in the data can lead to
entirely different trees. To analyze the stability of splits it is possible to visualize the
optimality criterion for candidate variables using mountain plots. Competing splits
within a variable become clearly visible and the comparison of mountain plots of
multiple candidate variables allows a quick assessment of the magnitude and cause
for potential instability.
he instability of a tree model can be used to obtain additional insight in the data
and to improve prediction accuracy. Bootstrapping provides a useful method for the
analysis ofmodelvariation bycreating awholesetoftreemodels.Visualization ofthe
useof covariates in the splits as weighted barcharts with aggregate impurity criterion
as weight allows quick assessment of variable importance. Variable masking can be
detected using weighted fluctuation diagrams of variables and trees. his view is also
useful for finding groups of related tree models.
Sectioned scatterplots alsoallowthevisualization ofpartition boundaries formul-
tiple trees. he resulting plot can no longer be used for global drill-down due to the
lack of shared subgroups, but it provides a way of analyzing the “fuzziness” of a cut-
point in conjunction with the data.
Finally, trace plots allow us to visualize split rules and the hierarchical structure
of arbitrarily many trees in a single view. hey are based on a grid of variables and
tree levels (nodes of the same depth) whereeach cell corresponds toa candidate split
variable, corresponding to a potential tree node. Actually used cells are connected in
the same way as in the hierarchical view, thus reflecting the full structure of the tree.
Multiple trees can be superimposed on this grid, each leaving its own “trace.” he
resulting plot shows frequently used paths, common subgroups, and alternate splits.
All plots in this chapter have been produced using R sotware for statistical com-
puting and KLIMT interactive sotware for visualization and analysis of trees and
forests. Visualization methods presented in this chapter are suitable for both presen-
tation of particular findings and exploratory work. he individual techniques com-
plementeach otherwellbyprovidingvarious different viewpoints onthe modelsand
data.hereforetheycanbesuccessfullyusedinaninteractiveframework.Traceplots,
forexample, represent a very useful overview that can be linked toindividual hierar-
chical views. Subgroups defined by cells in the trace plot can be linked to data-based
plots, its edges to sectioned scatterplots.
he methods presented here were mostly illustrated on classification examples,
but they can be equally used forregression trees and mostly for survival trees as well.
Also, all methods described here are not limited to binary trees, even though those
represent the most commonly used models. he variety of tree models and further
Search WWH ::




Custom Search