Grand Tours, Projection Pursuit Guided Tours, and Manual Controls - Data Visualization - page 313

Graphics Reference

In-Depth Information

Figure . . Let plot: scatterplot matrix of three of the important variables for separating the three

classes. A single classification tree usually produces the result to split the three classes based on two

variables, linoleic and eicosenoic. Right: a projection of linoleic and arachidic, along with eicosenoic,

produces a better gap between the classes

Forthedataset showninFig. . thereareeightvariables andthreeknownclasses.

A classification tree chooses just two of the variables, eicosenoic and linoleic, to sep-

arate the three classes. For the training sample eicosenoic separates one class (plot-

ted as circles) from the other two, and linoleic separates the remaining two classes

(plusses and triangles). he separation of these last two groups, although di cult to

see in the plot of eicosenoic against linoleic, is real (scatterplot matrix at let). here

is no gap between the groups of points, but it is possible to draw a line with points

from one class on one side of it and the points from the other class on the other

side. By using a tour we would have noticed that there is a big gap between the three

classes using all eight variables, and also that choosing just three provides a very neat

separation. It would be di cult to guess from pairwise plots that arachidic has an

important role, but from the tour we can see that when arachidic is combined with

linoleic the two classes are much better separated (right plot). he tour projection

shows the combination of linoleic and arachidic plotted horizontally that reveals the

gap. he tree solution was simple but inadequate, and a small change to the solution

provides a much better result.

hetree algorithm was hampered byboth variable wise operation and greediness.

It did not see the combination of linoleic and arachidic because it could only use

one variable at each step.It also stoppedimmediately whena separation between the

classes was found,having no sense of a bigger gap elsewhere.All numerical methods

have assumptions or algorithm constraints or complexity that sets limits on the re-

sults. Aclassical methodsuchaslinear discriminant analysis assumesthat the classes

in the data arise from a mixture of normal distributions having equal variance-co-

variance. Linear discriminant analysis finds a best separating projection similar to

the tree solution; one group is well-separated and the other two groups slightly over-

Next Page

Data Visualization

Search WWH ::

Custom Search

Home