Geoscience Reference
In-Depth Information
0.3
8
450
450
B
A
6
0.25
4
400
400
0.2
2
0.15
350
350
0
0.1
−2
0.05
300
300
−4
0
−6
250
250
−8
300
350
400
450
500
550
300
350
400
450
500
550
Emission wavelength (nm)
Figure 10.11. Regression coefficients for PLS prediction of DOC from fluorescence in the Horsens
catchment. (A) River and WTP model with two latent variables. (B) Estuary model with three latent
variables.
cross-validation used with a large data set that includes replicate samples) (Westerhuis
et al., 2008 ; Kjeldahla and Bro, 2010 ). This often leads to the selection of overfitted models
that are likely to perform worse for future predictions than models that have fewer latent
variables.
For the river and WTP sites, a five-component model has the lowest RMSECV; how-
ever, the two-component model appears to be a better choice, given that relatively small
gains in RMSECV (and the correlation coefficient, R 2 cv ) are obtained by including the last
three latent variables. For the estuary sites, a three- component model appears to be suffi-
cient. Again, a seven-component model has lower RMSECV, yet only very small improve-
ments in RMSECV are attained with the addition of many latent variables. A conservative
approach is thus to select two latent variables for the river and WTP model, and three for
the estuary model.
Plots of regression coefficients might be expected to highlight the EEM regions with
greatest influence upon the prediction of DOC concentration ( Figure 10.11 ), however,
in the case of spectral data (and non-designed data in general) one must take care not
to overinterpret the components (Kjeldahla and Bro, 2010 ). Figure 10.11 suggests that
for the river and WTP model, the T and M peaks are most important, with fluorescence
in these regions associated with higher DOC concentrations (positive regression coeffi-
cients, Figure 10.11A ). In the estuary model, the C peak region has high negative regres-
sion coefficients, indicating an inverse relationship with DOC, whereas high positive
coefficients are associated with the A-peak region ( Figure 10.11B ). Whereas the relation-
ship suggested in Figure 10.11A seems plausible, the strong inverse correlation between
peak C fluorescence and DOC concentration implied in Figure 10.11B is counterintui-
tive. In fact, the visual interpretation of this plot may be distorted by overlapping spectral
 
Search WWH ::




Custom Search