Modeling Methodology: Dimension Reduction and Resampling Methods - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

quadratic error, known as the coe cient of nondetermination, defines the

contribution of the model: the least expensive and least powerful model is the

model that predicts the output as the average value of the measured output,

irrespective of the input. For that model, the average quadratic error EQM r

is 1.

3.3 Input Dimension Reduction

The design of the model g ( x , w ) may require a reduction in dimension of the

input vector x . That is particularly important when the number variables is

too large to be handled conveniently; or when it is assumed that they are not

mutually independent. In the latter case, their reduction simplifies the design

of the model. The latter is therefore more robust with respect to the variability

of the data, and is less sensitive to overfitting due to over-parameterization

(see Chap. 2).

In order to explore the structure of multidimensional data, the analysis is

based on the observation of the distribution of variables in the input space.

When the number of factors is too high for visual analysis or digital process-

ing, it must be decreased. In linear statistics, PCA (Principal Component

Analysis) is used for reducing the number of factors. The method is based

on a linear combination of factors by projection. It provides a more synthetic

representation of the data.

In this section, we will review the principles of PCA; we will then discuss

CCA (Curvilinear Component Analysis), which may be viewed as a nonlinear

extension of PCA, well suited to representations of more complex data struc-

tures. A parallel will be drawn with self-organizing Kohonen maps, which are

also used for nonlinear data analysis.

3.4 Principal Component Analysis

Principal component analysis is one of the oldest statistical analysis tech-

niques. It was developed to study samples of individuals described by several

factors. The method is therefore suited to the analysis of multidimensional

data: in general, the separate study of each factor is not su cient, since it

does not allow for the detection of possible dependencies between factors.

3.4.1 Principle of PCA

To reduce the number of factors (components), PCA constructs sub-spaces of

input space (also termed representation space), whose dimensions are there-

fore smaller than the number of factors, in which the distribution of obser-

vations (points) is as similar as possible to their distribution in representa-

tion space. The similarity criterion is the total inertia of the scatter diagram.

Search WWH ::

Custom Search

Home