Geoscience Reference
In-Depth Information
the relationships between clusters and the principal components, while loading plots illus-
trate the projection of the original variables upon each principal axis. Hence, the loadings
explain in which part of the spectrum the respective principal components vary. In PCA
analyses of unfolded EEMs, loading plots look much like EEMs themselves, with each
plot showing the degree that different wavelength regions vary along the direction of the
corresponding principal component (Persson and Wedborg, 2001 ; Boehme et al., 2004 ).
Provided that the number of original variables in the data set is not large, it can also be con-
venient to view loadings and variables together in a so-called biplot (Gabriel, 1971 ).
As an illustration of the importance of preprocessing, three alternative preprocessing
methods are applied to the Horsens catchment data set prior to PCA decomposition of the
spectral data. Preprocessing by mean centering, autoscaling, and normalisation followed
by mean centering results in the identification of one, two, or three principal components,
respectively. With mean centering only it is not possible to obtain a valid multicomponent
PCA model - a single component is found describing 98% of the variation in the data
set. With auto scaling, the PCA finds two distinctly different phenomena ( Figure 10.4A ).
The first component describes 94% of the variation in the data set and shows a continuum
between stream and estuary sites, possibly reflecting concentration as much as chemical
differences between samples. The second component describes only 4% of total variation
and is due almost entirely to the wastewater samples; a close-up view ignoring site 16
shows little discrimination among other sites along this axis. In contrast, PCA analysis fol-
lowing row normalization and mean centering identifies three principal components, while
sites are seen to fall into three distinct clusters representing wastewater samples, estuary
samples, and stream samples ( Figure 10.4B ). Further, less distinct clustering is also appar-
ent between various stream sites. The first two components describe 64.9% and 13.6% of
the variation between samples, while the third (not shown) describes approximately 2.5%.
The percent variations described by the three different models are not directly comparable
because they are generated from differently processed data sets; however, it is apparent
that unsuitable preprocessing can reduce the effectiveness of PCA at partitioning variation
along secondary and subsequent axes, and in so doing diminish the visualization of multi-
variate discrimination between samples.
PCA can be performed on fluorescence measurements alone or in combination with
other water measurements. It may also be performed on scores obtained from other chemo-
metric models, although there can be redundancy in this approach. It is also important to
realize that variables appearing close together on, for example, a PCA score plot are typi-
cally not strongly correlated unless they also have high loadings in the corresponding com-
ponents. Consequently, correlations between variables suggested by a PCA should always
be confirmed directly by plotting the variables one against the another (Gabriel, 1971 ). In
previous fluorescence studies, PCA on unfolded EEMs has been used to study DOM fluo-
rescence variability in the oceans. Persson et al. (2001) used PCA to examine the mixing
of deep and surface water masses in the Baltic Sea. Boehme et al. ( 2004 ) explored seasonal
and regional variation in fluorescent DOM in the Gulf of Mexico, determining that 87% of
DOM fluorescence variability related to a single PCA component representing terrestrial
Search WWH ::




Custom Search