Graphics Reference
In-Depth Information
With respectto the distances tocompute, the points toclassify are reduced byone
unit at each step.here are n singletons toclassify at the firststep and n
groups
and singletons at the ith step. Only one group of size n remains at the end of the
algorithm.
hefinaloutputismainlyaffectedbytwoalgorithm parameters:thedistancemea-
sure and the linkage or amalgamation criterion to determine the distance between
a point and a group or between two groups. here exist several distance measures
according to the nature of the analyzed data.
Various linkage or amalgamation criteria permit one to determine the two most
similar clusters (groups) to be merged into one new group:
Single linkage (nearest neighbor): the distance between two clusters is deter-
mined by the distance of the two closest objects in the different clusters.
Complete linkage (furthest neighbor): thedistance between twoclustersisdeter-
mined by the distance of the two furthest objects in the different clusters.
Average linkage: the distance between two clusters is defined as the average dis-
tance between all pairs of objects in the two different clusters.
Centroid linkage: the distance between two clusters is determined as the differ-
ence between centroids. he centroid of a cluster is the average point in the mul-
tidimensional space defined by the dimensions.
Ward's method: thismethodisbasedonthe minimum variance criterion approach
to evaluating the overall heterogeneity increase when collapsing two clusters. In
short, the method aggregates clusters according to the minimum resulting sum
of squares.
i
+
he end result of a hierarchical clustering algorithm is a sequence of nested and in-
dexedpartitions.hesequencecanalsobevisualizedthroughatree,alsocalledaden-
drogram, which shows how the clusters are related to each other. he index refers to
the aggregation criterion and indicates the distance between two subsequent groups
(or objects). A dendrogram cut at a given level defines a partition of the data cases
into different k groups, where k increases by one at a time as the aggregation index
decreases. Figure . shows an example of a simple dendrogram and the resulting
clusters at the shown cutting level.
Choosing the level of the cut, and thus the number of the resulting classes in the
partition,canthenbedonebylookingatthedendrogram:thecuthastobemade
above the low aggregations, which bring together the elements that are very close to
oneanother, and underthe highaggregations, whichlump together all of the various
groups in the population.
When it has been decided where to cut the dendrogram, the next step is to try to
findout whichvariables have participated strongly tomerge the cases ineach cluster.
he dendrogram can therefore be used to provide visual grouping information,
i.e.,to readthe processof merging the single statistical units into homogeneous clus-
ters, thus playing a complementary role to the numerical algorithms in cluster
analysis.
Search WWH ::




Custom Search