Geoscience Reference
In-Depth Information
may represent PARAFAC's attempt to model background noise. Although visualization
alone can identify some problems, in most cases additional tools are needed to confirm that
visually feasible models are also mathematically robust.
Because the PARAFAC model makes no assumptions about spectral shapes nor the
structure of parameters and error terms, if two completely independent models derived
from different sets of samples arrive on similar spectral shapes, it provides strong evi-
dence that the spectra represent underlying chemical phenomena. In split-half validation,
independent halves of a data set are modeled separately. The model is validated when the
same components are found in each half-data set, as this result could not reasonably arise
from chance alone (Harshman and Lundy, 1994 ). When spectrally identical components
are uncovered in completely unrelated data sets, as has been reported with increasing fre-
quency (e.g., Stedmon et al., 2007 ; Murphy et al., 2011 ; this study), it can be taken as even
stronger validation that such PARAFAC components are chemically meaningful.
An additional tool for determining the number of components is the core consistency
diagnostic (Bro and Kiers, 2003 ), which checks the adherence of the data to the trilinear
PARAFAC model. Valid PARAFAC models have core consistency close to 100%, unstable
models have intermediate core consistencies (around 50%), and invalid models (caused by
data are not trilinear, or a model having too many components) have core consistencies that
are often near zero or negative. When a sequence of models are developed each with one
more component than the previous, the first overspecified model is often identified by a
large decrease in core consistency relative to the model with one fewer components.
Unfortunately, diagnostics such as these can at times give ambiguous or contradictory
results; for example, models with poor core consistencies can quite often be validated
using split half analysis (Murphy et al., 2008 ; Stedmon and Bro, 2008 ), or may have better
predictive capability than models with fewer components and higher core consistencies
(Bosco et al., 2006 ). The improvement of diagnostics for model selection is an active area
of research (Smilde et al., 2004 ), but it has to be stressed that extensive insight in to the
analytical data, the context of the actual problem and the mathematics and statistics behind
the modeling is needed to provide scientifically valid results in general. That said, an auto-
mated program for calculating PARAFAC models of EEM data is available (Bro and Vidal,
2010 ). This program takes into account the interdependence of a range of modeling deci-
sions and diagnostics and automatically determines the number of components, possible
outliers, and so forth. Although the automated program may be useful, it is imperative to be
aware that it is based on certain assumptions and is bound to fail for some data.
10.8 Practical Implementation of PARAFAC
Although the PARAFAC algorithm is designed to search for loadings and scores that pro-
duce the least-squares, “best it” solution, in practice it is possible for the algorithm to
converge on local, rather than global minimum residuals. When this happens, an incorrect
solution is obtained; or rather; the least squares solution is not obtained at all. To guard
against this, it is recommended that models are initialized with random starting conditions,
Search WWH ::




Custom Search