Graphics Reference
In-Depth Information
minimizing the quantity in ( . ) is equivalent to maximizing the quantity
trace
(
B
)
trace
(
T
)
. Notice that trace() indicates the trace operator, trace
(
T
)=
trace
(Mardia et al., ).
he algorithm family known as k-means is the most widely known nonhierarchi-
cal clustering approach; among the different k-means algorithms, that proposed by
Forgy is the most widely implemented in specialized sotware packages (Hartigan,
).
Letusassumepartitioning ofasetof n unitscharacterizedby p variables; themain
steps of a k-means algorithm are the following:
Step : k provisional group centers are randomly determined: c , c ,..., c k .
Step : a partition P
(
W
)+
trace
(
B
)
C , C ,...,C k
of the n objects into k clustersis obtained
using the assignment rule: a unit belongs to C k if it is nearer to c k than to all
other centers; an object x i is then assigned to the cluster C k if d
=
x i , c k
min.
Step : k new cluster centers are determined: c , c ,..., c k as the centers of gravity
of the clusters of the partition P .Newcenters are then usedtodefine a newpar-
tition P
(
)=
constructedaccordingtothesamerulesusedfor P .
he previous steps are iterated until convergence to a final partition, i.e., when two
succeeding iterations lead to the same partition or when a chosen criterion is ob-
tained that describes the quality of the obtained partition. A further stopping crite-
rion can be based on the number of iterations.
C , C ,...,C k
=
Stable Groups Identiication
Although it is based on a relatively slight theoretical basis, the k-means classification
method has an e cacy largely attested to by empirical results (Milligan, ). As
aconsequence,itisthepartitioningmethodbest adapted to large datasets.hek-
means method is used as an adjunct to other methods such as clustering prior to
principal axis analysis or directly as a descriptive tool. he method is particularly
well adapted to large numerical datasets because the data are read directly:thedata
matrix is saved on auxiliary memory and is read several times in sequential fashion,
without requiring large amounts of computer memory.
As the within-group variance can only become smaller between two steps of iter-
ations, the algorithm does converge; however, it is not the convergence itself but its
very high rate of convergence that justifies using this method in practice.
Generally, obtained partitions depend on the centers chosen at the first iteration.
he algorithm could converge toward local optima. he procedure of finding stable
groups (Lebart et al., ) is a kind of remedy for this situation. Its main advantage
is that it elaborates the results obtained in the rigid framework of a single partition
by highlighting high-density regions of the object points. he technique consists in
performing several partitions starting with different sets of centers and keeping as
stable groups the sets of objects that are always assigned to the same cluster.
Let us consider, as a small illustrative example, partitioning individuals into
several homogeneous groups. We perform a first basic partitioning into groups
around moving centers (only a few iterations are necessary to ensure stability of the
groups). his procedure is repeated three times.
Search WWH ::




Custom Search