Chemometric Analysis of Organic Matter Fluorescence - Aquatic Organic Matter Fluorescence

Geoscience Reference

In-Depth Information

discriminant analysis (LDA) and discriminant analysis based upon partial least squares

regression (PLS-DA).

In Figure 10.14 , PLS-DA on the unfolded EEMs (normalized and mean-centered) is

used to predict membership of three classes representing estuarine, stream or WTP sites for

each of the samples from the Horsens catchment data set. Cross-validation of the PLS-DA

model using random subsets of the data indicated that a three component model was appro-

priate. Unlike in PLS regression, where models are desired to have minimum RMSE cv , in

PLS-DA models are selected in order to have the lowest possible cross-validated classifica-

tion error rates (Kjeldahla and Bro, 2010 ). In the current example, cross-validated classifi-

cation error rates are 2.2%, 0.2%, and 1.6% for estuary, stream, and WTP sites respectively.

In Figure 10.14 , the vertical axis shows the predicted class membership (cross-validated)

for each sample, plotted in relation to a calculated threshold (dotted line) that distinguishes

samples that are assessed as belonging to a particular class (above the line) from samples

that are not (below the line). Ideally, a good classification model also shows tight clustering

of sites around 1 (class members) or zero (non-class members); in the current example, the

absence of tight clustering is presumably a reflection of the continuity between sites in the

data set, and the somewhat arbitrary designation of the three classes. In Figure 10.14A , all

but four samples from stations E1-E5 are correctly assigned to the estuary class, while a

small number of river ( n = 3) and WTP ( n = 3) samples cannot be distinguished from the

estuary class. In Figure 10.14B showing predicted membership of the stream class, sim-

ilar success rates for classification are observed. In Figure 10.14C , all samples from site

16 are correctly assigned to the WTP class, along with one misclassified sample from the

estuary.

In previous studies of DOM fluorescence, a limited range of classification techniques

have been applied. Bilal et al. ( 2010 ) used classification and regression trees (CART) of

DOM fluorescence characteristics to investigate the persistence of farm waste contamina-

tion during a biodegradation experiment. Hall and Kenny ( 2007 ) used SIMCA coupled to a

PARAFAC model to classify port samples according to their harbor of origin along the US

east coast, while Hall et al. ( 2005 ) used multilinear N-PLS-DA to classify samples by port

and river of origin. Overall, however, discrimination techniques have been underutilized

in the interpretation of DOM fluorescence data sets, and could play a much larger role in

understanding and predicting the behavior of natural organic matter fluorescence in the

future.

10.11 Summary

In this chapter, a range of chemometric models for exploring and visualizing CDOM fluo-

rescence data sets and for predicting the relationship between fluorescence and other vari-

ables have been introduced. It is apparent that exploratory methods, particularly PARAFAC

and PCA, have already been widely implemented. Conversely, calibration models and dis-

criminant analyses have been attempted relatively rarely, yet have considerable potential

Search WWH ::

Custom Search

Home