Graphics Reference
In-Depth Information
Figure . . Let plot: scatterplot matrix of three of the important variables for separating the three
classes. A single classification tree usually produces the result to split the three classes based on two
variables, linoleic and eicosenoic. Right: a projection of linoleic and arachidic, along with eicosenoic,
produces a better gap between the classes
Forthedataset showninFig. . thereareeightvariables andthreeknownclasses.
A classification tree chooses just two of the variables, eicosenoic and linoleic, to sep-
arate the three classes. For the training sample eicosenoic separates one class (plot-
ted as circles) from the other two, and linoleic separates the remaining two classes
(plusses and triangles). he separation of these last two groups, although di cult to
see in the plot of eicosenoic against linoleic, is real (scatterplot matrix at let). here
is no gap between the groups of points, but it is possible to draw a line with points
from one class on one side of it and the points from the other class on the other
side. By using a tour we would have noticed that there is a big gap between the three
classes using all eight variables, and also that choosing just three provides a very neat
separation. It would be di cult to guess from pairwise plots that arachidic has an
important role, but from the tour we can see that when arachidic is combined with
linoleic the two classes are much better separated (right plot). he tour projection
shows the combination of linoleic and arachidic plotted horizontally that reveals the
gap. he tree solution was simple but inadequate, and a small change to the solution
provides a much better result.
hetree algorithm was hampered byboth variable wise operation and greediness.
It did not see the combination of linoleic and arachidic because it could only use
one variable at each step.It also stoppedimmediately whena separation between the
classes was found,having no sense of a bigger gap elsewhere.All numerical methods
have assumptions or algorithm constraints or complexity that sets limits on the re-
sults. Aclassical methodsuchaslinear discriminant analysis assumesthat the classes
in the data arise from a mixture of normal distributions having equal variance-co-
variance. Linear discriminant analysis finds a best separating projection similar to
the tree solution; one group is well-separated and the other two groups slightly over-
Search WWH ::




Custom Search