Geoscience Reference
In-Depth Information
discriminant analysis (LDA) and discriminant analysis based upon partial least squares
regression (PLS-DA).
In
Figure 10.14
, PLS-DA on the unfolded EEMs (normalized and mean-centered) is
used to predict membership of three classes representing estuarine, stream or WTP sites for
each of the samples from the Horsens catchment data set. Cross-validation of the PLS-DA
model using random subsets of the data indicated that a three component model was appro-
priate. Unlike in PLS regression, where models are desired to have minimum RMSE
cv
, in
PLS-DA models are selected in order to have the lowest possible cross-validated classifica-
tion error rates (Kjeldahla and Bro,
2010
). In the current example, cross-validated classifi-
cation error rates are 2.2%, 0.2%, and 1.6% for estuary, stream, and WTP sites respectively.
In
Figure 10.14
, the vertical axis shows the predicted class membership (cross-validated)
for each sample, plotted in relation to a calculated threshold (dotted line) that distinguishes
samples that are assessed as belonging to a particular class (above the line) from samples
that are not (below the line). Ideally, a good classification model also shows tight clustering
of sites around 1 (class members) or zero (non-class members); in the current example, the
absence of tight clustering is presumably a reflection of the continuity between sites in the
data set, and the somewhat arbitrary designation of the three classes. In
Figure 10.14A
, all
but four samples from stations E1-E5 are correctly assigned to the estuary class, while a
small number of river (
n
= 3) and WTP (
n
= 3) samples cannot be distinguished from the
estuary class. In
Figure 10.14B
showing predicted membership of the stream class, sim-
ilar success rates for classification are observed. In
Figure 10.14C
, all samples from site
16 are correctly assigned to the WTP class, along with one misclassified sample from the
estuary.
In previous studies of DOM fluorescence, a limited range of classification techniques
have been applied. Bilal et al. (
2010
) used classification and regression trees (CART) of
DOM fluorescence characteristics to investigate the persistence of farm waste contamina-
tion during a biodegradation experiment. Hall and Kenny (
2007
) used SIMCA coupled to a
PARAFAC model to classify port samples according to their harbor of origin along the US
east coast, while Hall et al. (
2005
) used multilinear N-PLS-DA to classify samples by port
and river of origin. Overall, however, discrimination techniques have been underutilized
in the interpretation of DOM fluorescence data sets, and could play a much larger role in
understanding and predicting the behavior of natural organic matter fluorescence in the
future.
10.11 Summary
In this chapter, a range of chemometric models for exploring and visualizing CDOM fluo-
rescence data sets and for predicting the relationship between fluorescence and other vari-
ables have been introduced. It is apparent that exploratory methods, particularly PARAFAC
and PCA, have already been widely implemented. Conversely, calibration models and dis-
criminant analyses have been attempted relatively rarely, yet have considerable potential