Geology Reference
In-Depth Information
Now assume a cluster C, and we can calculate d(I, C), the average dissimilarity
of i to all objects in the cluster C. After calculating d(i, C) for all clusters C 6 ¼ A we
can now consider an another term b(i), which can be de
ned as
b
ð
i
Þ ¼
min
C 6 ¼A d
ð
i
;
C
Þ
ð 3 : 49 Þ
Using a(i) and b(i), the silhouette value S(i) can be de
ned as follows:
bi
ðÞ
ai
ðÞ
Si
ðÞ ¼
ð 3 : 50 Þ
max ai
f
ðÞ;
bi
ðÞ
g
1 and 1. If the value is 1, it means that the
objective data belong to a more appropriate cluster.
The silhouette value is between
3.5 Implementation of Principal Component Analysis
Principal component analysis (PCA) is one of the major innovations in applied
linear algebra, applied widely in engineering and applied science. PCA is a mul-
tivariate procedure which transforms the original data in such a way that the
maximum variances are projected onto the axes. The distinct advantage of PCA is
its ability to reduce the dimensionality of an original data set while retaining as
much information as possible. The derivation of principal components (PCs) is
based on the eigenvectors and eigenvalues of either the covariance matrix or the
correlation matrix. There are numerous claims in the literature for establishing the
first use of the concept of PCA [ 18 , 60 ]. However, probably the most famous work
on PCA in the early days was the paper by Pearson [ 59 ]. A similar and basic
description of this method in physics was given by Cauchy [ 15 ]. Adcock [ 1 ] gave
the earliest non-speci
c reference to PCA in the chemical literature in which the
author tackled a simple problem of linear calibration with this concept.
The basic background of PCA can be explained as follows. Assume an event for
which p variables (attributes) Xi i are being measured sequentially through time for
n time instances. The corresponding data set X =[ x 1 , x 2 ,
, x p ] consists of
p vectors x i , where x i =[x 1i , x 2i ,
is a column vector which contains the
n measurements of the variable Xi i (where x i includes a univariate time series). Each
row of X corresponds to the measurements of all variables at a speci
, x ni ]
c time.
Therefore, each row of X can be considered as a point in p-dimensional space. PCA
derives a new set of orthogonal and uncorrelated composite variates Y (j) , which are
called principal components:
Y ð j Þ ¼
a 1j X 1 þ
a 2j X 2 þ þ
a pj X p ;
where j
¼
1
;
2
; ...;
p
ð 3 : 51 Þ
 
Search WWH ::




Custom Search