Multivariate Statistics - MATLAB Recipes for Earth Sciences

Graphics Programs Reference

In-Depth Information

data = load('sediments.txt');

for i=1:10

sample(i,:) = ['sample',sprintf('%02.0f',i)];

end

clear i

minerals= ['amp';'pyr';'pla';'ksp';'qtz';'cla';'flu';'sph';'gal'];

Subsequently, the distances between pairs of samples can be computed. The

function pdist provides many ways for computing this distance, such as

the Euclidian or Manhattan distance. We use the default setting which is the

Euclidian distance.

Y = pdist(data);

The function pdist returns a vector Y containing the distances between

each pair of observations in the original data matrix. We can visualize the

distances on another pseudocolor plot.

squareform(Y);

imagesc(squareform(Y)),colormap(hot)

title('Euclidean distance between pairs of samples')

xlabel('First Sample No.')

ylabel('Second Sample No.')

colorbar

The function squareform converts Y into a symmetric, square format, so

that the elements (i,j) of the matrix denote the distance between the i

and j objects in the original data. Next we rank and link the samples with

respect to their inverse distance using the function linkage .

Z = linkage(Y);

In this 3-column array Z , each row identifi es a link. The fi rst two columns

identify the objects (or samples) that have been linked, the third column

contains the individual distance between these two objects. The fi rst row

(link) between objects (or samples) 1 and 2 has the smallest distance cor-

responding to the highest similarity. Finally, we visualize the hierarchical

clusters as a dendrogram which is shown in Figure 9.4.

dendrogram(Z);

xlabel('Sample No.')

ylabel('Distance')

box on

Clustering fi nds the same groups as the principal component analysis. We

observe clear groups consisting of samples 1, 2, 8 to 10 (the magmatic

Search WWH ::

Custom Search

Home