Geoscience Reference
In-Depth Information
discriminant analysis (LDA) and discriminant analysis based upon partial least squares
regression (PLS-DA).
In Figure 10.14 , PLS-DA on the unfolded EEMs (normalized and mean-centered) is
used to predict membership of three classes representing estuarine, stream or WTP sites for
each of the samples from the Horsens catchment data set. Cross-validation of the PLS-DA
model using random subsets of the data indicated that a three component model was appro-
priate. Unlike in PLS regression, where models are desired to have minimum RMSE cv , in
PLS-DA models are selected in order to have the lowest possible cross-validated classifica-
tion error rates (Kjeldahla and Bro, 2010 ). In the current example, cross-validated classifi-
cation error rates are 2.2%, 0.2%, and 1.6% for estuary, stream, and WTP sites respectively.
In Figure 10.14 , the vertical axis shows the predicted class membership (cross-validated)
for each sample, plotted in relation to a calculated threshold (dotted line) that distinguishes
samples that are assessed as belonging to a particular class (above the line) from samples
that are not (below the line). Ideally, a good classification model also shows tight clustering
of sites around 1 (class members) or zero (non-class members); in the current example, the
absence of tight clustering is presumably a reflection of the continuity between sites in the
data set, and the somewhat arbitrary designation of the three classes. In Figure 10.14A , all
but four samples from stations E1-E5 are correctly assigned to the estuary class, while a
small number of river ( n = 3) and WTP ( n = 3) samples cannot be distinguished from the
estuary class. In Figure 10.14B showing predicted membership of the stream class, sim-
ilar success rates for classification are observed. In Figure 10.14C , all samples from site
16 are correctly assigned to the WTP class, along with one misclassified sample from the
estuary.
In previous studies of DOM fluorescence, a limited range of classification techniques
have been applied. Bilal et al. ( 2010 ) used classification and regression trees (CART) of
DOM fluorescence characteristics to investigate the persistence of farm waste contamina-
tion during a biodegradation experiment. Hall and Kenny ( 2007 ) used SIMCA coupled to a
PARAFAC model to classify port samples according to their harbor of origin along the US
east coast, while Hall et al. ( 2005 ) used multilinear N-PLS-DA to classify samples by port
and river of origin. Overall, however, discrimination techniques have been underutilized
in the interpretation of DOM fluorescence data sets, and could play a much larger role in
understanding and predicting the behavior of natural organic matter fluorescence in the
future.
10.11 Summary
In this chapter, a range of chemometric models for exploring and visualizing CDOM fluo-
rescence data sets and for predicting the relationship between fluorescence and other vari-
ables have been introduced. It is apparent that exploratory methods, particularly PARAFAC
and PCA, have already been widely implemented. Conversely, calibration models and dis-
criminant analyses have been attempted relatively rarely, yet have considerable potential
Search WWH ::




Custom Search