Geoscience Reference
In-Depth Information
title('Euclidean distance between pairs of samples')
xlabel('First Sample No.')
ylabel('Second Sample No.')
colorbar
h e function squareform converts Y into a symmetric, square format, so that
the elements (i,j) of the matrix denote the distance between the i and j
objects in the original data. We next rank and link the samples with respect
to the inverse of their separation distances using the function linkage .
Z = linkage(Y)
Z =
2.0000 9.0000 0.0564
8.0000 10.0000 0.0730
1.0000 12.0000 0.0923
6.0000 7.0000 0.1022
11.0000 13.0000 0.1129
3.0000 4.0000 0.1604
15.0000 16.0000 0.1737
5.0000 17.0000 0.1764
14.0000 18.0000 0.2146
In this 3-column array Z , each row identii es a link. h e i rst two columns
identify the objects (or samples) that have been linked, while the third
column contains the separation distance between these two objects. h e i rst
row (link) between objects (or samples) 1 and 2 has the smallest distance,
corresponding to the greatest similarity. In our example samples 2 and 9
have the smallest separation distance of 0.0564 and are therefore grouped
together and given the label 11, i.e., the next available index higher than the
highest sample index 10. Next, samples 8 and 10 are grouped to 12 since
they have the second smallest separation dif erence of 0.0730. h e next row
shows that the new group 12 is then grouped with sample 1, which have
a separation dif erence of 0.0923, and so forth. Finally, we visualize the
hierarchical clusters as a dendrogram, which is shown in Figure 9.7.
dendrogram(Z);
xlabel('Sample No.')
ylabel('Distance')
box on
Clustering i nds the same groups as the principal component analysis. We
observe clear groups consisting of samples 1, 2, 8, 9 and 10 (the magmatic
source rocks), samples 3, 4 and 5 (the hydrothermal vein), and samples 6 and
7 (the sandstone). One way to test the validity of our clustering result is to
use the cophenet correlation coei cient :
cophenet(Z,Y)
Search WWH ::




Custom Search