Graphics Reference
In-Depth Information
Figure . . Two mountain plots of variables Rooms and LowStat and corresponding scatterplots vs.
response variable. Red lines: optimal splits; gray lines in scatterplots: means for each partition
gionsofcompetingvariablesareinthevicinityoftheoptimum,thusallowingdomain
knowledge to be taken into account.
he name “mountain” plots is derived from the fact that the plots usually resem-
ble a profile of a mountain range. hey are mainly useful for assessing the quality of
a split along with potential competing splits. his information can be used to inter-
actively influence the tree construction process or to construct multiple tree models
and compare their behavior.
Visualizing Forests
10.3
Sofarwehavebeendiscussingvisualizationofindividualtreemodels.Wehaveshown,
however, that there is an inherent volatility in the choice of splits that may affect the
stability ofagiven model.hereforeitisusefultogrowmultipletrees. Inwhatfollows
we will briefly introduce tree ensemble methods and present visualization methods
for forests consisting of multiple tree models.
here are two main approaches to generating different tree models by making
changes to:
Training data: changesinthetrainingdatawillproducedifferentmodelsiftheorig-
inaltreewasunstable.Bootstrappingisausefultechniquetoassessthevariability
of the model-fitting process.
Splits: allow locally suboptimal splits that create different partitions inorderto pre-
vent the greedy algorithm from getting stuck in a local optimum, which may not
necessarily be a global optimum.
Search WWH ::




Custom Search