Graphics Reference
In-Depth Information
mainly in the fringe for small groups. herefore it is advisable to weight the contri-
bution of each split by a cumulative statistic such as the decrease of impurity.
hecumulativevalueofimpuritydecreaseforeachvariableofthe bootstrapped
trees is displayed in the right plot of Fig. . . he variables in each plot are ordered
by the bar height, representing their importance. We see that UCS is by far the most
influential variable, followed by UCH and BNi.
When making inference on the displayed information, we need to be cautious
and keep the tree properties in mind. Variable masking can heavily influence the
results of such analyses. Given two highly correlated variables, it is very likely that
they will produce very similar split results. herefore the CART algorithm guided
by the bootstrap will pick one of them at random. Since the decision was made, the
other variable is not likely to be used anymore. If one of the variables is “weaker,” it
will hardly appear in any model, even though in the absence of the stronger variable
it may still perform the best out of all the other variables.
To analyze that behavior, but also to see how different the tree models are, it is
necessary totake both the variable and the individual tree into account. Two-dimen-
sional weighted fluctuation diagrams showing trees and split variables are shown in
Fig. . . Variables are plotted on the y-axis, the models on the x-axis. he area of
each rectangle is proportional to the cumulative impurity decrease of all splits using
a specific variable in the tree model. In general, fluctuation diagrams are useful for
detecting patterns and comparisons in both the x and y directions.
Focusing on the largest gains, we can distinguish four different model groups. In
models, UCS is the most influential variable, followed by UCH with models and
BNi and BCn with one model each. Looking at the large group of models we can
also spot several patterns. In cases, UCH is also used, although not contributing as
heavily as in its dominant position, but then we see another cases where UCH is
not used at all. Visually we get the impression that BNi replaces UCH in those cases,
which hints at variable masking. We see a similar behavior with UCS and UCH,too.
Figure . . Fluctuation diagram of trees and variables displaying cumulated deviance gain of splits
featuring that combination of tree and split variable
Search WWH ::




Custom Search