Graphics Reference
In-Depth Information
Introduction
10.1
Tree-basedmodelsprovideanappealingalternativetoconventionalmodelsformany
reasons. hey are more readily interpretable, can handle both continuous and cat-
egorical covariates, can accommodate data with missing values, provide an implicit
variableselection,andmodelinteractionswell.Mostfrequentlyusedtree-basedmod-
els are classification, regression, and survival trees.
Visualization isimportant inconjunction withtreemodelsbecauseintheirgraph-
icalformtheyareeasily interpretable evenwithout special knowledge.Interpretation
of decision trees displayed as a hierarchy of decision rules is highly intuitive.
Moreover tree models reflect properties of the underlying data and have other
supplemental information associated with them, such as quality of cut points, split
stability, and prediction trustworthiness. All this information, along with the com-
plex structure of the trees themselves, gives plenty of information that needs to be
explored and conveyed. Visualization provides a powerful tool for presenting differ-
ent key aspects of the models in a concise manner that allows quick comparisons.
In this chapter we will first quickly introduce tree models and present techniques
for visualizing individual trees. hose range from classical hierarchical views up to
less widely known methods such as treemaps and sectioned scatterplots.
In the next section we will use visualization tools to discuss the stability of splits
and entire tree models, motivating the use of tree ensembles and forests. Finally we
present methodsfordisplaying entire forestsat aglance andother waysforanalyzing
multiple tree models.
Individual Trees
10.2
hebasic principleof all tree-based methodsisarecursive partitioning of thecovari-
atesspacetoseparatesubgroupsthatconstituteabasisforprediction.hismeansthat
starting withthefulldataset ateachsteparuleisconsulted thatspecifieshowthedata
are split into disjoint partitions. his process is repeated recursively until there is no
rule defined for further partitioning.
Commonly usedclassification andregressiontreesuseunivariate decisionrulesin
eachpartitioning step,thatis,therulespecifyingwhichcasesfallintowhichpartition
evaluates only one data variable at a time. For continuous variables the rule usually
creates two partitions satisfying the equations x i
s, respectively, where s
is a constant. Partitions induced by rules using categorical variables are based on the
categories assigned to each partition. We refertoa partitioning step oten as split and
speak of the value s as the cut point.
herecursivepartitioning processcan bedescribedbyatree.herootnodecorre-
spondstothefirstsplitanditschildrentosubsequentsplitsintheresultingpartitions.
he tree is built recursively in the same way as the partitioning and terminal nodes
(also called leaves) represent final partitions. herefore each inner node corresponds
to a partitioning rule and each terminal node to a final partition.
<
s and x i
Search WWH ::




Custom Search