Information Technology Reference
In-Depth Information
of matrices represent samples (formulation, batch, etc.) and columns of
matrices represent variables. A third dimension, which is sometimes
included, represents a time point when a specifi c variable is measured (if
time variability is of interest).
Selection of variables to be included in a regression model is the key in
making model predictions accurately. Use of state-of-the-art in-process
monitoring techniques in the pharmaceutical industry often results in
acquisition of huge data sets that are of no relevance if there is no
adequate technique for selection of signifi cant variables. Different
approaches in variables selection are used, and the main difference is
whether one or multiple signifi cant variables are investigated at the same
time. Univariate selection is used when variables are analyzed separately
from each other and is usually accompanied by t-statistics and ANOVA
tests to compare sample groups. The drawback of this approach is that
data interaction is not considered as leading to useful models being
developed.
Multivariate variable selection is advantageous to univariate selection,
as it can capture potential variable correlation. Some of the methods have
already been explained, such as determination of PLS weights based on
co-variances between the response and each variable (Hoskuldsson,
2001). Other methods include determination of regression coeffi cients
size (Centner et al., 1996), variable importance on projection (VIP)
(Eriksson et al., 2001), interval PLS (Norgaard et al., 2000), genetic
algorithms (GA) (Lavine et al., 2004), etc.
Sometimes it is necessary to apply a pretreatment procedure, in order
to prepare the data for modeling. The purpose of pretreatment is to
remove outliers and noise from the data, as well as for easier comparison
of different data sets. Data pretreatment is usually dependent on the
technique used for data acquisition. Spectroscopy techniques often
require normalization, differentiation, and multiplicative scatter
correction (MSC) (Geladi et al., 1985), as well as orthogonal signal
correction (OSC), optimized scaling (OS), standard normal variate
(SNV), fi rst and second derivative, de-trend correction, offset correction,
etc. (Rajalahti and Kvalheim, 2011). There is no clear consensus or
guidelines on selection of the pretreatment method, therefore it is often
based upon experience and trial-and-error approach. In the stage of data
pretreatment (preprocessing), suffi cient knowledge on sources of
variation in the data is required to ensure elimination of only unnecessary
outliers and noise.
Both classifi cation and regression multivariate models require
validation. The model validation approach depends on the type of
￿
￿
￿
 
Search WWH ::




Custom Search