Geoscience Reference
In-Depth Information
overinterpret the results. MCR suffers from so-called rotational ambiguity, which essen-
tially means that there can be many equally good (and equally good-looking) solutions to
a problem. This is in contrast to PCA, for which there is only one single solution. Hence,
caution is warranted especially for fluorescence data that have characteristics (e.g., broad
overlapping spectra) that increase the difficulty of obtaining identified solutions with MCR
(Jaumot and Tauler, 2010 ).
10.5 Principal Component Analysis
One of the simplest and most often used exploratory methods is principal component
analysis (PCA), which identifies the most important uncorrelated variations in a data set,
termed principal components. The principal components are defined by a new orthogo-
nal and truncated coordinate system upon which the original data are mapped. PCA is
typically used for exploring data to obtain preliminary assessments of the importance of
different variables, clustering and classification of objects (samples), and for detecting out-
liers. The direction accounting for most of the variability in the data set (the “hyperplane
of maximum variance”) is the first principal component, with each subsequent principal
component accounting for the maximum variability in the remainder of the data set once
all preceding principal components have been subtracted from it.
PCA is normally considered a nonparametric method that does not rely on any hypoth-
eses about data probability distributions, and provides a unique solution (except sign inde-
terminacy). It provides the least squares solution for compressing the original set of higher
dimensional vectors into a set of lower-dimensional vectors from which the original set can
be reconstructed. Typically, PCA is performed on data represented in a table (or matrix)
form. PCA can also be performed on three-way EEM data sets that have been unfolded
along one dimension so that the rows of the matrix are represented by samples and the
columns by unique combinations of one excitation and one emission wavelength, which
are each treated as individual variables (see Figure 10.2C ). This is sometimes referred to as
a Tucker1 or, somewhat misleadingly, a multiway PCA model.
Mathematically, the PCA model decomposes the data matrix into a set of so-called
bilinear terms and a residual matrix:
F
1
X
=
a bei
+
=…
1
,
, ;
I
j
=…
1
,
,
J
ij
if
jf
ij
f
=
where x ij is the intensity of the i ith sample at the j ith variable; a if is called a score value and
locates each sample along each principal component; and b jf is a loading matrix element
describing the contribution of each variable toward each principal component. Finally e ij is
the residual error, representing the variability not accounted for by the model.
Typically in a PCA analysis, the first few principal components describe most of the
variability in the data set, allowing the transformed data set to be easily visualized as a
series scores and loading plots. Score plots depict clustering and separation of objects and
Search WWH ::




Custom Search