Chemometric methods application in pharmaceutical products and processes analysis and control - Computer-Aided Applications in Pharmaceutical Technology

Information Technology Reference

In-Depth Information

samples being analyzed. In all validation methods, data sets are divided

into training and validation sets. A training set is used for model

construction, whereas a validation set is used to test the model's

performance. Many different methods are available for testing of a

model's predictive performance, but cross-validation is still the preferred

one. An overview of cross-validation methods is provided in relevant

literature (Bro et al., 2008). A model for samples that are not related in

time can be validated by the leave-one-out (LOO) approach, also referred

to as internal validation, where one of the samples is left out to test the

developed model. The procedure is repeated for each sample separately,

such that in this way the whole data set is used for model testing. If

samples are time-related, then entire batches are left out for model

validation to avoid overfi tting (Lopes et al., 2004).

Model performance can be evaluated by calculation of root-mean-

square error of calibration (RMSEC) and the root-mean-square error of

cross-validation (RMSECV):

[4.19]

where N is the number of samples, and y i and ˆ i are experimentally

obtained and predicted values for calibration samples (in the case of

RMSECV) or validation samples (in the case of RMSECV). The samples

used for cross-validation are not used in the model construction, therefore

providing external testing (i.e. external validation) of the developed model.

Values predicted by the model are compared to experimentally obtained

values, by using correlation coeffi cient (Eq. 4.16).

Measures of model suitability also often used are standard error of

prediction (SEP) or standard error of cross-validation (SECV) (Doherty

and Lange, 2006).

Standard deviation of predicted values can be determined by the

bootstrapping technique, so that new sets of data are generated by

random sampling from the original data set and standard deviation of

ensemble of estimates is derived (Wehrens et al., 2000).

4.2.4 Drawbacks of chemometrics

Note that pure application of chemometrics, that is multivariate analysis

tools, does not necessarily improve knowledge of the problem (process,

formulation) being studied. Firstly, careful assessment needs to be made

Search WWH ::

Custom Search

Home