Chemometric methods application in pharmaceutical products and processes analysis and control - Computer-Aided Applications in Pharmaceutical Technology

Information Technology Reference

In-Depth Information

correlation coeffi cient and/or distances (Euclidean, Mahalanobis)

between samples (Massart et al., 2003).

Linear discriminant analysis (LDA) is similar to PCA in terms of

feature reduction. It is a parametric method used to fi nd optimal

boundaries between classes. Analogous to PCA, a direction is sought that

achieves maximum separation among different classes (Sharaf et al.,

1986). Unknown samples are classifi ed according to Euclidean distances.

K-nearest neighbors (KNN) is a non-parametric method, where an

unknown sample is classifi ed according to a class belonging to the

majority of its neighbors. The neighborhood is defi ned by Euclidean

distances between samples.

Soft independent modeling of class analogy (SIMCA) is a parametric

classifi cation technique that is based on PCA. The data set is fi rst divided

into classes of similar samples. PCA is then performed for each class

separately, resulting in a PC model (Massart and Buydens, 1988). Cross-

validation is used to determine the optimal number of PCs for each class.

SIMCA puts more emphasis on similarity within a class than on

discrimination between classes (Roggo et al., 2007).

PLS discriminant analysis (PLS-DA) is a parametric and linear method

that identifi es LVs in the featured spaces, which have maximal covariance

within the predictor variables (Stahle and Wold, 1987; Roggo et al.,

2007). It is a special case of PLS where the response variable is a binary

vector of zeros and ones, describing the class membership for each sample

in the investigated groups (Rajalahti and Kvalheim, 2011).

Among nonlinear methods used for classifi cation purposes, ANNs have

been proven as one of the most promising methods. Wang et al. (2004)

discussed advantages and disadvantages of multivariate discriminant

analysis and neural networks as classifi ers. Classifi cation was also

performed with a probabilistic neural network (PNN), which has an

exponential activation function (instead of the most commonly used

sigmoid function) (Specht, 1990), and with a learning vector quantization

neural (LVQ) network (e.g. a self-organizing map, SOM) (Kohonen, 1990).

It is recommended (Roggo et al., 2007) to use more than one

classifi cation method, since the optimal one cannot be known a priori ,

since classifi cation is dependent on data being analyzed.

4.2.2 Regression methods

Multivariate analysis seeks relationships between a series of independent

(explanatory) x -variables, and dependent (response) y -variables. This

Search WWH ::

Custom Search

Home