Geoscience Reference
In-Depth Information
9.5 Cluster Analysis
Cluster analysis creates groups of objects that are very similar to each other,
compared to other individual objects or groups of objects. It i rst computes
the similarity or (alternatively) the dissimilarity (or distance) between all
pairs of objects and then ranks the groups according to their similarity or
distance, i nally creating a hierarchical tree visualized as a dendrogram. h e
grouping of objects can be useful in the earth sciences, for example when
making correlations within volcanic ash layers (Hermanns et al. 2000) or
comparing dif erent microfossil assemblages (Birks and Gordon 1985).
h ere are numerous methods for calculating the similarity or (alternatively)
the dissimilarity (or distance) between two data vectors. Let us dei ne two
data sets consisting of multiple measurements on the same object. h ese data
can be described by vectors:
h e most popular measures of dissimilarity (or distance) between the two
sample vectors are:
• the Euclidian distance - h is is simply the shortest distance between the
two points describing two measurements in the multivariate space:
h e Euclidian distance is certainly the most intuitive measure for
similarity. However, in heterogeneous data sets consisting of a number of
dif erent types of variables, a better alternative would be
• the Manhattan (or city block ) distance - In the city of Manhattan, one must
follow perpendicular avenues rather than crossing blocks diagonally. h e
Manhattan distance is therefore the sum of all dif erences:
Measures of similarity include
• the correlation similarity coei cient - h is uses Pearson's linear product-
moment correlation coei cient to compute the similarity of two objects:
Search WWH ::




Custom Search