Geoscience Reference
In-Depth Information
h e i rst principle component (PC 1 ), which is denoted by Y 1 , contains the
highest variance, PC 2 contains the second highest variance, and so forth.
All the PCs together contain the full variance of the data set. h is variance
is, however, largely concentrated in the i rst few PCs, which include most of
the information content of the data set. h e last PCs are therefore generally
ignored to reduce the dimensions of the data. h e factors a ij in the above
equations are the principal component loads ; their values represent the
relative contributions of the original variables to the new PCs. If the load
a ij of a variable X j in PC 1 is close to zero then inl uence of this variable is
low, whereas a high positive or negative a ij suggests a strong contribution.
h e new values Y j of the variables computed from the linear combinations
of the original variables X j , weighted by the loads, are called the principal
component scores .
PCA is commonly used as a method for unmixing (or separating) variables
X , which are a linear combination of independent source variables S
where A is the mixing matrix. PCA tries to determine (although not
quantitatively) both the source variables S (represented by the principal
components scores) and the mixing matrix A (represented by the principal
component loads). Unmixing such variables works best if the probability
distribution of the original variables X is a Gaussian distribution, and only in
such cases are the principal components completely decorrelated. However,
data in earth sciences are ot en not Gaussian distributed and alternative
methods, such as independent component analysis (ICA), should therefore
be used instead (Section 9.3). For example, radiance and rel ectance values
from hyperspectral data are ot en not Gaussian distributed and ICA is
therefore widely used in remote sensing applications to decorrelate the
spectral bands, rather than PCA. Examples in which PCA is used include
the assessment of sediment provenance (as described in the example below),
the unmixing of peridotite mantle sources of basalts, and multispectral
classii cation of satellite images.
In the following example a synthetic data set is used to illustrate the use
of the function pca included in the Statistics Toolbox. h irty samples were
taken from thirty dif erent levels in a sedimentary sequence containing
varying proportions of the three dif erent minerals stored in the columns
of the array x . h e sediments were derived from three distinct rock types
(with unknown mineral compositions) whose relative contributions to each
of the thirty sediment samples are represented by s1 , s2 and s3 . Variations in
these relative contributions (as represented by the thirty values in s1 , s2 and
Search WWH ::




Custom Search