Graphics Reference
In-Depth Information
Each final partition has been assigned a prediction value or model. For classifi-
cation trees the value is the predicted class, for regression trees it is the predicted
constant, but more complex tree models exist such as those featuring linear models
in terminal nodes. In what follows we will mostly use classification trees with binary
splits for illustration purposes, but all methods can be generalized for more complex
tree models unless specifically stated otherwise. We call a tree consisting of rules in
inner nodes regardless of the type of prediction in the leaves a decision tree.
Hierarchical Views
10.2.1
Probably the most natural way to visualize a tree model is to display its hierarchical
structure. Let us describe more precisely what it is we want to visualize. To describe
the topology of a tree, we want to borrow some terminology for the graph theory.
A graph is a set of nodes (sometimes called vertices)andedges.hereatree is defined
as a connected, acyclic graph. Topologically, decision trees are a special subset of
those, namely, connected directed acyclic graphs (DAGs) with exactly one node of
indegree (the root - it has no parent) and outdegrees other than (i.e., at least two
children or none at all).
To fully describe a decision tree, additional information is associated with each
node. For inner nodes this information represents the splitting rule; for terminal
nodes it consists of the prediction. Plots of tree models attempt to make such infor-
mationvisibleinaddition todisplayingthegraphaspectofthemodel.hreedifferent
ways to visualize the same classification tree model are shown in Fig. . .
he tree model is based on the Italian olive oil dataset (Forina et al. ), which
records the composition of Italian olive oils from different regions of Italy. Each co-
variate corresponds to the proportion (in / th) of a fatty acid (in the order of
concentration): oleic, palmitic, linoleic, stearic, palmitoleic, arachidic, linolenic,and
eicosenoic acid. he response variable is categorical and specifies the region of ori-
gin. he goal is to determine how the composition of olive oils varies across re-
gions of Italy. For illustration purposes we perform a classification using five regions:
Sicily, Calabria, Sardinia, Apulia,andNorth (the latter consolidating regions north
of Apulia).
Although the underlying model is the same for all plots in Fig. . , the visual
representation is different in each plot. Visualization of a tree model based on its
hierarchical structure has to contemplate the following tasks:
Placement of nodes
Visual representation of nodes
Visual representation of edges
Annotation
Each task can be used to represent additional information associated with the model
or data. Visual representation of a node is probably the most obvious way to add such
information. In the first (top let) plot, a node consists solely of a tick mark with an
annotation describing the split rule for the let child. In the second (top right) plot,
a node is represented by a rectangle whose size corresponds to the number of cases
Search WWH ::




Custom Search