Geoscience Reference
In-Depth Information
and that it is confirmed that essentially the same solution is obtained from different random
starting conditions. Obtaining the same solution means that the sum of the squared residu-
als from different models should be identical. The loadings from prior PARAFAC models
can also be used as “first guesses” To speed up modeling or assist PARAFAC in arriving
at a “likely” solution (Bro, 1997 ), however, this can increase the risk that the model will
incorrectly settle on local rather than global minimum residuals.
Unstable models can often be recognized by low core consistencies, or by the fact that
the model changes when small numbers of samples are removed from the data set. It is
important to identify and remove outliers before modeling so they do not exert undue influ-
ence on the model. One technique that is useful in this regard is jack-knifing, a resampling
method used to assess the influence and leverage of individual samples within a data set
(Riu and Bro, 2003 ). Residual error plots should be always be examined for evidence of
nonrandom structure. Consistent peaks in the residuals suggest that additional components
may be needed, whereas peaks and troughs appearing next to each other can indicate an
overfitted, or poorly fitted model (Stedmon et al., 2003 ).
Unstable models can sometimes be improved by applying appropriate constraints dur-
ing modeling (Andersen and Bro, 2003 ). For example, it is common in fluorescence appli-
cations that concentrations and spectra are constrained to be non-negative. It can also work
well to constrain spectra to having no more than a single peak (unimodality). Constraints
can assist PARAFAC in arriving at stable, chemically sensible solutions especially for real-
world, noisy data sets. However, care has to be taken to ensure that the process does not
cover up problems that would be better solved with other approaches, and that important
chemical phenomena are not obscured or misrepresented as a result.
10.8.1 PARAFAC Models of Organic Matter
Of all chemometric methods, PARAFAC is currently the one most frequently applied to
the analysis of organic matter fluorescence EEMs. Figure 10.5 shows the relationship
between sample size and numbers of components identified in 33 PARAFAC models of
DOM in natural waters and soils, including only independently derived models by a range
of research groups that were published between 2003 and 2010. While there has been a
general trend where larger numbers of components are resolved in larger data sets, as many
as five components have been identified with as few as 18 samples (Hall and Kenny, 2007 ).
Models with eight or more components were typically derived from data sets that included
soil samples (Yamashita et al., 2008 ; Fellman et al., 2009a , 2009b , 2009c ; Chen et al.,
2010 ), with the largest models usually derived from studies that analyzed a diverse range
of samples from lakes, streams, soils, wetlands, estuaries and the ocean.
Chemical interpretations of individual PARAFAC components in organic matter fluo-
rescence data sets have ranged from very general (protein-like, humic-like) to somewhat
more specific (tryptophan-like, tyrosine-like, quinone-like). Of models created from data
sets consisting of more than 100 samples, the vast majority have each resolved PARAFAC
Search WWH ::




Custom Search