Information Technology Reference
In-Depth Information
samples being analyzed. In all validation methods, data sets are divided
into training and validation sets. A training set is used for model
construction, whereas a validation set is used to test the model's
performance. Many different methods are available for testing of a
model's predictive performance, but cross-validation is still the preferred
one. An overview of cross-validation methods is provided in relevant
literature (Bro et al., 2008). A model for samples that are not related in
time can be validated by the leave-one-out (LOO) approach, also referred
to as internal validation, where one of the samples is left out to test the
developed model. The procedure is repeated for each sample separately,
such that in this way the whole data set is used for model testing. If
samples are time-related, then entire batches are left out for model
validation to avoid overfi tting (Lopes et al., 2004).
Model performance can be evaluated by calculation of root-mean-
square error of calibration (RMSEC) and the root-mean-square error of
cross-validation (RMSECV):
[4.19]
where N is the number of samples, and y i and ˆ i are experimentally
obtained and predicted values for calibration samples (in the case of
RMSECV) or validation samples (in the case of RMSECV). The samples
used for cross-validation are not used in the model construction, therefore
providing external testing (i.e. external validation) of the developed model.
Values predicted by the model are compared to experimentally obtained
values, by using correlation coeffi cient (Eq. 4.16).
Measures of model suitability also often used are standard error of
prediction (SEP) or standard error of cross-validation (SECV) (Doherty
and Lange, 2006).
Standard deviation of predicted values can be determined by the
bootstrapping technique, so that new sets of data are generated by
random sampling from the original data set and standard deviation of
ensemble of estimates is derived (Wehrens et al., 2000).
￿
￿
￿
4.2.4 Drawbacks of chemometrics
Note that pure application of chemometrics, that is multivariate analysis
tools, does not necessarily improve knowledge of the problem (process,
formulation) being studied. Firstly, careful assessment needs to be made
 
Search WWH ::




Custom Search