Graphics Reference
In-Depth Information
byaggregating twoelements, usingcriteria linked tovariance. hecomputations
are not time consuming when the clustering is performed ater a factorial analy-
sis(PCAorMCA)andtheobjectstobeclassifiedarelocatedbytheircoordinates
on the first axes of the analysis.
Step : final partition
he partition of the population is defined by cutting the dendrogram. Choosing
the level of the cut, and thus the number of classes in the partition, can be done
by looking at the tree: the cut has to be made above the low aggregations, which
bringtogethertheelementsthatareveryclosetooneanother,andunderthehigh
aggregations, which lump together all the various groups in the population.
Some Considerations on the MIXED strategy
Classifying a large dataset is a complex task, and it is di cult to find an algorithm
that alone will lead to an optimal result. he proposed strategy, which is not entirely
automatic and which requires several control parameters, allows us to retain control
over the classification process. he procedure below illustrates an exploratory strat-
egy allowing the definition of satisfactory partition(s) of data. It is weakly affected by
thenumberofunitsandcanoffergoodresultsinafairlyreasonabletime.InMDAap-
plications on real datasets, especially in cases of huge databases, much experience is
required to effectively tune the procedure parameters (Confais and Nakache, ).
A good compromise between accuracy of results and computational time can be
achieved by using the following parameters:
. he number of basic partitionings, which through cross-tabulation define the
stable groups (usually two or three basic partitionings);
. he number of groups in each basic partitioning (approximately equal to the
unknown number of “real” groups, usually between and );
. he number of iterations to accomplish each basic partitioning (less than five is
usually su cient);
. he number of principal coordinates used to compute any distance and aggre-
gation criterion (depending on the decrease of the eigenvalues of principal axis
analysis: usually between and for a large number of variables);
. Finally, the cut level of the hierarchical tree in order to determine the number of
final groups (in general, by visual inspection).
Nearest-neighbor-accelerated algorithms for hierarchical classification permitone to
directly build atree on the entire population. However,these algorithms cannot read
the data matrix sequentially. he data, which usually are the first principal coordi-
nates of a preliminary analysis, must be stored in central memory. his is not a prob-
lem when the tree is built on the stable groups of a preliminary k-means partition
(also computed on the first principal axes). Besides working with direct reading, the
partitioning algorithm has another advantage. he criterion of homogeneity of the
groups is better satisfied in finding an optimal partition rather than in the morecon-
strained case of finding an optimal family of nested partitions (hierarchical tree). In
addition, building stable groups constitutes a sort of self-validation of the classifica-
tion procedure.
Search WWH ::




Custom Search