Graphics Reference
In-Depth Information
terminal elements of the tree are the k groups of the preliminary partition. he next
stepistocutthetreeatthemostappropriateleveltoobtainaninterpretable partition.
his level may be chosen visually or automatically determined (some “Cattell crite-
ria”) (Cattell, ). he final partition will be a refinement of this crude partition.
A schema of the mixed strategy follows (Fig. . ).
Step : preliminary partition
he first step is to obtain rapidly a large number of small groups that are very
homogeneous. We use the partition defined by the stable groups obtained from
cross-tabulating two or three base partitions. Each base partition is calculated
using the algorithm of moving centers (k-means) ater reading the data directly
so as to minimize the use of central memory. he calculations are generally per-
formed on the coordinates of the individuals of the first few principal axes of
a principal coordinate analysis. Note that the distance computations are accel-
erated on these orthogonal coordinates, as noise in the data (distributed within
the lastcoordinates) iseliminated andasprincipal coordinates maybee ciently
computed using any stochastic approximation algorithms.
Step : hierarchical aggregation of the stable groups
Some of the stable groups can be very close to one another, corresponding to
agroupthatisartificially cutbytheprecedingstep.Ontheotherhand,theproce-
dure generally creates several small groups, sometimes containing only one ele-
ment. hegoal ofthe hierarchical aggregation phaseistoreconstitute thegroups
that have been fragmented and to aggregate the apparently dispersed elements
around their original centers.
he tree is built according to Ward's aggregation criterion, which has the advan-
tage of accounting for the size of the elements to classify as weight in the calcu-
lations of the loss of variance through aggregation. It is a technique of minimum
variance clustering that seeks to optimize, at every step, the partition obtained
Figure . . Mixed strategy for classifying huge datasets
Search WWH ::




Custom Search