Graphics Reference
In-Depth Information
rection is proportional to the impurity decrease and thus in a sense to the “quality”
of the split. he third plot uses special placement in that it is rotated
counter-
clockwise relative to the usual representation and all terminal nodes are aligned to
facilitate easy comparison of the class proportions.
he visual representation of edges is usually restricted to drawing direct or orthog-
onal lines. Nevertheless, more elaborate representation of edges, such as polygons
whose width is proportional to the number of cases following that particular path, is
another possibility, creating a visual representation of the “flow” of data through the
tree.
Annotations are textual or symbolic representations displayed along the nodes or
edges. In Fig.
.
annotations describe predictions and splitting rules. Although an-
notations can be useful, they should be used with caution because they can easily
clutter the plot and thus distract from the key points to be conveyed.
Overloading plotswithinformation canoffsetthebenefitsoftheplot,inparticular
itsability toprovideinformation ataglance.Whentherepresentation ofanodeistoo
large, because it, e.g.,includes alist of statistics oradditional plots, it will consume so
much space that it is only possible to display very few levels of the tree on a screen.
he same applies to a printed version, because the size of a sheet of paper is still
limited.hereforeadditional toolsarenecessary tokeep trackoftheoverall structure
in order not to get lost. Most of these tools, such as zoom, pan, overview window, or
toggling of different labels, are available in an interactive context only. Especially for
an analysis, a visualization of additional information is required. here are basically
two possibilities for providing such information:
Integrate the information in the tree visualization.
Use external linked graphics.
Direct integration is limited by the spatial constraints posed by the fixed dimension
of a computer screen or other output medium. Its advantage is the immediate impact
on the viewer and therefore easier usage. It is recommended to use this kind of visu-
alization for properties that are directly tied to the tree. It makes less sense to display
a histogram of the underlying dataset directly in a node because it displays derived
information that can bemorecomfortably displayedoutside the tree,virtually linked
to a specific node. It is more sensible to add information directly related to the tree
structure, such as the criterion used for the growth of the tree.
External linked graphics are more flexible because they are not displayed directly
inthetreestructureforeachnodebutareonlylogically linkedtoaspecificnode.Spa-
tial constraints arelessofaproblembecause onegraphic isdisplayedinstead ofmany
for each node. he disadvantage of linked graphics is that they must be interpreted
more carefully. he viewer has to bear in mind the logical link used to construct the
graphics as it is not visually attached to its source (node in our case).
here isno fixed rule as of what kind of information should bedisplayed inside or
outside the tree structure. A rule of thumb says that more complex graphics should
use the external linked approach, whereas less complex information directly con-
nected with the tree structure should be displayed in the tree visualization.