Information Technology Reference
In-Depth Information
correlation coeffi cient and/or distances (Euclidean, Mahalanobis)
between samples (Massart et al., 2003).
Linear discriminant analysis (LDA) is similar to PCA in terms of
feature reduction. It is a parametric method used to fi nd optimal
boundaries between classes. Analogous to PCA, a direction is sought that
achieves maximum separation among different classes (Sharaf et al.,
1986). Unknown samples are classifi ed according to Euclidean distances.
K-nearest neighbors (KNN) is a non-parametric method, where an
unknown sample is classifi ed according to a class belonging to the
majority of its neighbors. The neighborhood is defi ned by Euclidean
distances between samples.
Soft independent modeling of class analogy (SIMCA) is a parametric
classifi cation technique that is based on PCA. The data set is fi rst divided
into classes of similar samples. PCA is then performed for each class
separately, resulting in a PC model (Massart and Buydens, 1988). Cross-
validation is used to determine the optimal number of PCs for each class.
SIMCA puts more emphasis on similarity within a class than on
discrimination between classes (Roggo et al., 2007).
PLS discriminant analysis (PLS-DA) is a parametric and linear method
that identifi es LVs in the featured spaces, which have maximal covariance
within the predictor variables (Stahle and Wold, 1987; Roggo et al.,
2007). It is a special case of PLS where the response variable is a binary
vector of zeros and ones, describing the class membership for each sample
in the investigated groups (Rajalahti and Kvalheim, 2011).
Among nonlinear methods used for classifi cation purposes, ANNs have
been proven as one of the most promising methods. Wang et al. (2004)
discussed advantages and disadvantages of multivariate discriminant
analysis and neural networks as classifi ers. Classifi cation was also
performed with a probabilistic neural network (PNN), which has an
exponential activation function (instead of the most commonly used
sigmoid function) (Specht, 1990), and with a learning vector quantization
neural (LVQ) network (e.g. a self-organizing map, SOM) (Kohonen, 1990).
It is recommended (Roggo et al., 2007) to use more than one
classifi cation method, since the optimal one cannot be known a priori ,
since classifi cation is dependent on data being analyzed.
￿
￿
￿
4.2.2 Regression methods
Multivariate analysis seeks relationships between a series of independent
(explanatory) x -variables, and dependent (response) y -variables. This
 
Search WWH ::




Custom Search